P2P 指令详解TPUT / TGET【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isaTPUT — 远程写数据流srcGlobalData本地 GM→stagingTileDataUB→dstGlobalData远端 GM当 GlobalTensor 超出 UB Tile 容量时自动执行二维滑动分块。接口签名// 单 Tile自动分块— 编译期原子类型 template AtomicType atomicType AtomicType::AtomicNone, typename GlobalDstData, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TPUT(GlobalDstData dst, GlobalSrcData src, TileData stagingTile, WaitEvents... events); // 单 Tile — 运行时原子类型 template typename GlobalDstData, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TPUT(GlobalDstData dst, GlobalSrcData src, TileData stagingTile, AtomicType atomicType, WaitEvents... events); // 乒乓双缓冲 template AtomicType atomicType AtomicType::AtomicNone, typename GlobalDstData, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TPUT(GlobalDstData dst, GlobalSrcData src, TileData pingTile, TileData pongTile, WaitEvents... events);约束GlobalSrcData::RawDType GlobalDstData::RawDTypeTileData::DType GlobalSrcData::RawDTypeGlobalSrcData::layout GlobalDstData::layoutdstGlobalData必须指向远端地址srcGlobalData必须指向本地地址stagingTileData必须预先在 UB 中分配乒乓模式pingTile和pongTile必须相同类型和维度不重叠的 UB 偏移atomicType支持AtomicNone和AtomicAdd示例// 基础远程写 comm::TPUT(dstG, srcG, stagingTile); // 带原子加的远程写 comm::TPUTAtomicType::AtomicAdd(dstG, srcG, stagingTile); // 乒乓双缓冲自动分块 constexpr size_t tileUBBytes ((64 * 64 * sizeof(float) 1023) / 1024) * 1024; TileT pingTile(64, 64); TileT pongTile(64, 64); TASSIGN(pingTile, 0); TASSIGN(pongTile, tileUBBytes); comm::TPUT(dstG, srcG, pingTile, pongTile); // 运行时选择原子类型 comm::TPUT(dstG, srcG, stagingTile, AtomicType::AtomicAdd);TGET — 远程读数据流srcGlobalData远端 GM→stagingTileDataUB→dstGlobalData本地 GM接口签名// 单 Tile template typename GlobalDstData, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TGET(GlobalDstData dst, GlobalSrcData src, TileData stagingTile, WaitEvents... events); // 乒乓双缓冲 template typename GlobalDstData, typename GlobalSrcData, typename TileData, typename... WaitEvents RecordEvent TGET(GlobalDstData dst, GlobalSrcData src, TileData pingTile, TileData pongTile, WaitEvents... events);约束与 TPUT 类似但方向相反srcGlobalData指向远端dstGlobalData指向本地TGET 不支持原子操作示例// 基础远程读 comm::TGET(dstG, srcG, stagingTile); // 乒乓双缓冲远程读 comm::TGET(dstG, srcG, pingTile, pongTile);【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考