CANN/GE外置权重特性

张

张建站

2026/7/16 23:49:03

10分钟阅读

GE 外置权重FileConstant / External Weight特性【免费下载链接】geGEGraph Engine是面向昇腾的图编译器和执行器提供了计算图优化、多流并行、内存复用和模型下沉等技术手段加速模型执行效率减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge将模型权重从 OM 文件中分离单独存储在磁盘文件。场景OM 文件大小受限、模型加密、多模型共享权重hash 去重、在线推理 Hybrid 模式加速。用户接口接口文件说明ATC--external_weightapi/atc/main_impl.cc0内嵌(默认), 1独立文件, 2合并文件ge.externalWeightoptioninc/graph_metadef/external/ge_common/ge_api_types.h在线编译时设置Hybrid 模式默认1ge.externalWeightDiroption同上指定权重落盘路径aclmdlSetExternalWeightAddressinc/external/acl/acl_mdl.h加载时设置用户 Device 内存优先级高于 ACL_MDL_WEIGHT_PATH_PTRCompiledGraphSummary::GetExternalWeightPathsinc/external/ge/ge_graph_compile_summary.h编译后获取ExternalWeightDesc列表路径/大小/偏移/ID权重路径优先级ge.externalWeightDir $ASCEND_WORK_PATH/tmp_weight_{pid}_{sid} ./tmp_weight_{pid}_{sid}编译期Const → FileConstant 转换入口compiler/graph/manager/graph_manager.ccBuild 阶段读取ge.externalWeight选项值为 1 或 2 时触发转换。文件存储模式 1每个权重独立文件weight_sha256多线程写入8线程flock(LOCK_EX)保护 meta.json 并发编译模式 2所有权重合并到一个文件512 字节对齐DMA 要求通过 offset 定位meta.json 记录 hash→file/offset 映射路径管理编译期写入tmp_weight_pid_sid/→ OM 输出时ChangeFilePath迁移到OM同目录/weight/→RefreshRelativePath将 location 刷新为仅文件名逆向转换compiler/graph/preprocess/graph_prepare.ccConvertFileConstToConst读文件 → 创建 GeTensor → 节点类型改回 Const用于 ONNX 导入等场景。运行期Runtime V2在线推理Lowering 阶段runtime/v2/engine/gelocal/file_constant_converter.ccLoweringFileConstantNode注册为REGISTER_NODE_CONVERTER(FileConstant)。路径解析优先级location 私有属性 file_path IR属性 file_id ge.exec.value_bins模型加载流程api/acl/acl_model/model/model.cppaclmdlLoadWithConfigImplaclmdlSetExternalWeightAddress将{fileName, devPtr, size}存入handle-fileConstantMem→ 加载时通过LoadExecutorArgs → LoweringGlobalData::SetFileConstantMem传递 → Lowering 阶段GetUserDeviceAddress按文件名匹配查找用户 Device 内存。运行期Runtime V1DavinciModel 离线推理runtime/v1/graph/load/model_manager/davinci_model.cc模型加载时PreProcessFileConstants预分配所有 FileConstant 内存。合并模式HandleCombinedWeightsAllocateCombinedWeightMemory先查用户内存GetFileConstantUserDeviceMem(file_name)在file_constant_user_device_mems_中按文件名匹配无用户内存MallocFileConstantMem申请 HBM →CopyOneWeightFromFileWithFilehandler一次性 H2Dexternal_weight_combined_mem_addr_unique_ptr 自定义 deleter管理生命周期用户内存不释放GE 内存在析构时释放MapNodeAddressesToCombinedWeightfileconstant_addr_mapping[logic_output_offset] base_addr weight_offset校验偏移不越界VarManager::SetVarIsReady标记就绪。独立模式HandleIndividualWeights逐节点处理GetUserDeviceMemForFileConstant提取文件名 → 在file_constant_user_device_mems_查找 → 校验mem_size - offset weights_size→ 返回device_mem offset无用户内存MallocFileConstantMem分配 HBM权重数据后续由 FileConstantKernel 运行时加载写入映射fileconstant_addr_mappingVarManager::SetVarIsReady内存释放FreeFileConstantMemDavinciModel 析构时调用。合并模式靠external_weight_combined_mem_addr_unique_ptr 析构独立模式遍历fileconstant_addr_mapping跳过用户内存IsUserDeviceMemForFileConstant仅释放 GE 分配的 HBM。运行时地址查找runtime/v2/kernel/known_subgraph/davinci_model_kernel.cc通过kMemoryBaseTypeFileConstant类型标识在fileconstant_addr_mapping中查找 logic_offset → device_addr 映射。关键数据结构davinci_model.hstd::string file_constant_weight_dir_; // 权重文件目录 std::mapstd::string, FileConstantMem file_constant_user_device_mems_; // 文件名 → 用户Device内存 std::unique_ptrvoid, std::functionvoid(void*) external_weight_combined_mem_addr_; // 合并权重(智能指针) // runtime_param_.fileconstant_addr_mapping: mapint64_t, uintptr_t 逻辑偏移 → 物理地址ExternalWeightManager — 全局权重管理base/graph/manager/graph_external_weight_manager.ccSession 级别每个 session 一个ExternalWeightManagerExternalWeightManagerPool全局单例管理去重CheckAndSetWeightLoaded按 devicefile 记录已加载权重避免重复加载分片SaveSlicedFileConstantInfo / TryGetSlicedFileConstantInfo支持大模型分片生命周期Session 析构时RemoveManager → Finalize自动清理临时权重目录FileConstantMeta持久化为 meta.json{ hash_to_weight_file: {sha256...: /path/weight_sha256...}, hash_to_weight_offset: {sha256...: 0} }关键文件索引层次文件职责APIapi/atc/main_impl.ccATC--external_weight参数定义APIapi/acl/acl_model/model/model_config.cppaclmdlSetExternalWeightAddress实现APIapi/acl/acl_model/model/model.cppaclmdlLoadWithConfig加载分发 file_constant_memsAPIapi/session/session/user_hybrid_graph_manager.ccHybrid 模式默认启用 externalWeight1Compilercompiler/graph/manager/graph_manager.ccBuild 阶段 Const→FileConstant 入口Compilercompiler/graph/preprocess/graph_prepare.ccPrepare 阶段 FileConstant→Const 逆向转换Compilercompiler/graph/build/graph_compile_summary_impl.ccSetExternalWeightPaths编译摘要Compilercompiler/api/generator/ge_generator.ccOM 输出时权重文件路径迁移Basebase/common/file_constant_utils/file_constant_utils.cc核心工具类转换、读写、路径管理Basebase/graph/manager/graph_external_weight_manager.ccSession 级权重管理器RT V1runtime/v1/graph/load/model_manager/davinci_model.ccPreProcessFileConstants 内存预分配全套逻辑RT V1runtime/v1/graph/load/model_manager/davinci_model.hFileConstant 相关数据结构RT V2runtime/v2/kernel/ge_local_kernel/file_constant_kernel.ccFileConstantKernel / FileConstantUserMemKernelRT V2runtime/v2/engine/gelocal/file_constant_converter.ccLowering 阶段节点转换RT V2runtime/v2/kernel/known_subgraph/davinci_model_kernel.cc运行时地址映射查找权重初始化RT V2runtime/v2/lowering/model_converter.ccfile_constant_mems 传递到 LoweringGlobalDataParserparser/parser/onnx/onnx_file_constant_parser.ccONNX FileConstant 算子解析【免费下载链接】geGEGraph Engine是面向昇腾的图编译器和执行器提供了计算图优化、多流并行、内存复用和模型下沉等技术手段加速模型执行效率减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

抖音批量下载终极指南：解锁无水印视频下载的完整解决方案

抖音批量下载终极指南：解锁无水印视频下载的完整解决方案【免费下载链接】douyin-downloader A practical Douyin downloader for both single-item and profile batch downloads, with progress display, retries, SQLite deduplication, and browser fallback su…...

2026/7/16 23:47:02 阅读更多 →

基于SocialDAO的性勒索防御系统：技术架构与工程实践

1. 项目概述：一个面向未来的性勒索综合防御体系在数字生活的阴影面，性勒索（Sxtortion）正成为一种日益猖獗且极具破坏性的网络犯罪。它利用受害者的私密信息或影像进行威胁、敲诈，造成的心理创伤和社会伤害往往难以估量…...

2026/7/13 15:41:07 阅读更多 →

CANN/cannbot-skills调度同步调试

调度、同步与调试【免费下载链接】cannbot-skills CANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体，本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills 1. 循环原语 T.serial(N) / T.serial(start, end, s…...

2026/7/13 15:28:21 阅读更多 →

PlantUML 实战：5分钟将 UML 2.5 序列图转换为可执行代码草图

PlantUML 实战：5分钟将 UML 2.5 序列图转换为可执行代码草图在软件开发过程中，清晰的系统设计往往比编码本身更为关键。传统拖拽式UML工具虽然直观，却常常成为效率杀手——频繁的鼠标操作打断设计思路，版本控制困难，…...

2026/7/16 12:43:00 阅读更多 →

GPT-5.6上线新模型融合编码能力

GPT-5.6系列模型已正式上线，其核心更新包括模型性能提升、分层定价以及产品界面的重大整合。原独立的Codex编码工具已并入ChatGPT，形成了统一的桌面客户端入口。 GPT-5.6系列模型概览该系列采用天体命名，包含三个定位不同的模型&#xff0…...

2026/7/16 13:02:25 阅读更多 →

终极免费PPT计时器：让你的演示时间掌控如呼吸般自然

终极免费PPT计时器：让你的演示时间掌控如呼吸般自然【免费下载链接】ppttimer 一个简易的 PPT 计时器项目地址: https://gitcode.com/gh_mirrors/pp/ppttimer 还在为PPT演示时间把控不准而焦虑吗？每次演讲都担心超时被打断，或者时间…...

2026/7/16 17:42:20 阅读更多 →

15款专业字体一次搞定：设计师和开发者的终极字体解决方案

15款专业字体一次搞定：设计师和开发者的终极字体解决方案【免费下载链接】fonts My favorite fonts: SF Pro Text, Pingfang SC, Avenir Next, Roboto, Uber and more. 项目地址: https://gitcode.com/gh_mirrors/font/fonts 还在为找不到合适的字体而烦恼吗…...

2026/7/14 12:47:23 阅读更多 →