CANN/catlass TLA单块广播操作
TileBroadcastOneBlkTla【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]功能说明TileBroadcastOneBlkTla实现 TLA 风格的 one-block 广播操作。与TileBroadcastOneBlk功能相同通过tla::Tensor封装接口。适用范围所有架构无架构特化风格TLA模板原型template class ArchTag_, // 架构标签 class ElementCompute_, // 计算元素类型直接传入 uint32_t COMPUTE_LENGTH_ // 计算长度 struct TileBroadcastOneBlkTla;模板参数说明ArchTag_架构标签ElementCompute_计算元素类型如halfCOMPUTE_LENGTH_需要广播的元素总数调用接口template class TensorUbOut, class TensorUbIn void operator()(TensorUbOut ubOut, TensorUbIn ubIn)通过ubOut.layout()(ubOut.coord())计算偏移后调用AscendC::Brcb。调用示例#include catlass/epilogue/tile/tile_broadcast_one_blk.hpp using namespace Catlass::Epilogue::Tile; constexpr uint32_t COMPUTE_LENGTH 256; auto layoutOut tla::MakeLayouthalf, layout::RowMajor(COMPUTE_LENGTH, 32); auto layoutIn tla::MakeLayouthalf, layout::VectorLayout(COMPUTE_LENGTH, 1); AscendC::LocalTensorhalf ubOutData, ubInData; auto ubOut tla::MakeTensor(ubOutData, layoutOut, Arch::PositionUB{}); auto ubIn tla::MakeTensor(ubInData, layoutIn, Arch::PositionUB{}); TileBroadcastOneBlkTlaArch::AtlasA2, half, COMPUTE_LENGTH op; op(ubOut, ubIn);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考