AGENTS.md - PTO Tile Library【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isaThis file provides essential information for agentic coding agents working in this repository.Build / Lint / Test CommandsBuild Commands# Build and run CPU simulator tests (recommended first step) python3 tests/run_cpu.py --clean --verbose # Build and run specific CPU demo python3 tests/run_cpu.py --demo gemm --verbose python3 tests/run_cpu.py --demo flash_attn --verbose # Build NPU tests (requires Ascend CANN environment) python3 tests/script/build_st.py -r npu -v a3 -t all # One-click build and run scripts ./build.sh --run_all --a3 --sim # Full ST tests on simulator ./build.sh --run_simple --a5 --npu # Simplified ST tests on hardware ./build.sh --pkg # Build packageRunning Single Tests# CPU simulator single test python3 tests/run_cpu.py --testcase tadd --gtest_filter TADDTest.case_float_64x64_64x64 # NPU single test (sim or npu) python3 tests/script/run_st.py -r sim -v a3 -t tadd -g TADDTest.case_float_64x64_64x64 python3 tests/script/run_st.py -r npu -v a3 -t tadd -g TADDTest.case_float_64x64_64x64 # Auto mode compilation python3 tests/script/run_st.py -r sim -v a3 -a -t tadd -g TADDTest.case_float_64x64_64x64Lint / Format Commands# Format C code (Google style, 120 char limit) clang-format -i -styleGoogle file # Format Python code (Ruff) ruff format file ruff check fileCode Style GuidelinesC Code StyleStyle: Google style with customizationsLine length: 120 charactersIndentation: 4 spaces (no tabs)Pointer alignment: Right-aligned (int* ptr)Braces: Functions: opening brace on new line, other blocks: same lineHeader guards:#ifndef FILENAME_H_formatFile HeadersAll source files must include the standard copyright header:/** Copyright (c) 2025 Huawei Technologies Co., Ltd. This program is free software, you can redistribute it and/or modify it under the terms and conditions of CANN Open Software License Agreement Version 2.0 (the License). Please refer to the License for details. You may not use this file except in compliance with the License. THIS SOFTWARE IS PROVIDED ON AN AS IS BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. See LICENSE in the root of the software repository for the full text of the License. */Naming ConventionsClasses/Structs:PascalCase(e.g.,GlobalTensor,TileShape2D)Functions:PascalCasefor PTO instructions (e.g.,TADD,TMATMUL),camelCasefor helpersVariables:camelCase(e.g.,src0Tile,gmOffsetA)Constants/Enums:UPPER_SNAKE_CASE(e.g.,BUFFER_NUM,PIPE_MTE1)Template parameters:PascalCase(e.g.,LeftTile,RightTile)Macros:UPPER_SNAKE_CASE(e.g.,PTO_STATIC_ASSERT,AICORE)Import OrganizationSystem C headers (#include cstdio)Third-party headers (#include gtest/gtest.h)PTO internal headers (#include pto/common/type.hpp)Local headersPTO Instruction Patterns// Standard PTO instruction usage #include pto/pto-inst.hpp using namespace pto; // Tile declaration using TileData TileTileType::Vec, T, kRows_, kCols_, BLayout::RowMajor, -1, -1; TileData srcTile(kRows_, kCols_); // Global tensor declaration using DynShape Shape1, 1, 1, kGRows_, kGCols_; using DynStride Stride1, 1, 1, kGCols_, 1; using GlobalData GlobalTensorT, DynShape, DynStride; GlobalData srcGlobal(src); // PTO instruction pattern TLOAD(srcTile, srcGlobal); TADD(dstTile, src0Tile, src1Tile); TSTORE(dstGlobal, dstTile);Template and Type UsageUseconstexprfor compile-time constantsUsetemplate typename T, int kRows_, int kCols_for parameterized kernelsUse__gm__attribute for global memory pointersUse__out__and__in__attributes for output/input parametersUseAICOREmacro for AI Core functions (expands to[aicore]on NPU)UsePTO_INSTfor public PTO instruction declarationsUsePTO_INTERNALfor internal implementationsAssertions and Error Handling// Compile-time assertions PTO_STATIC_ASSERT(condition); PTO_STATIC_ASSERT(condition, custom message); // Runtime assertions (CPU simulator only) PTO_CPU_ASSERT(condition); PTO_CPU_ASSERT(condition, custom message); // Google Test assertions in test files EXPECT_TRUE(condition); ASSERT_EQ(expected, actual);Event Synchronization Pattern// Set flag set_flag(PIPE_MTE2, PIPE_V, EVENT_ID0); // Wait for flag wait_flag(PIPE_MTE2, PIPE_V, EVENT_ID0); // Template-based flag helpers template pipe_t srcPipe, pipe_t dstPipe AICORE inline void SetFlag(uint32_t id) { set_flag(srcPipe, dstPipe, static_castevent_t(id)); }Memory Layout and BuffersUse double buffering withBUFFER_NUM 2constantUseTileTileType::Vec, ...for vector operationsUseTileTileType::Cube, ...for cube operationsUseGlobalTensorT, Shape, Stridefor global memory accessBuffer sizes typically use KiB units (e.g.,32 * 1024for 32 KiB)Python Code StyleFormatter: Ruff (configured in pyproject.toml)Quotes: Double quotesLine length: 120 charactersIndentation: 4 spacesTest File Structure// Test kernel file: testcase_kernel.cpp #include pto/pto-inst.hpp #include pto/common/constants.hpp #include gtest/gtest.h using namespace pto; template typename T, int ...params AICORE void runTest(__gm__ T __out__ *out, __gm__ T __in__ *src) { // Kernel implementation } template typename T, int ...params void LaunchTest(T *out, T * *src, void *stream) { runTestT, ...params(out, src); } // Explicit template instantiations template void LaunchTestfloat, ...params(float *out, float * *src, void *stream);CMakeLists.txt Pattern# For test cases pto_costmodel_sim_st(tadd) # For kernel builds pto_add_kernel(target_name)Key Directoriesinclude/pto/: Public API headersinclude/pto/cpu/: CPU simulator implementationsinclude/pto/npu/: NPU implementations (a2a3, a5)kernels/manual/: Manual mode kernel implementationstests/cpu/st/testcase/: CPU simulator test casestests/npu/: NPU test casestests/script/: Test runner scriptsdemos/: Demo applicationsImportant NotesAlways test on CPU simulator before NPU hardwareUse--cleanflag with CPU tests for fresh buildsNPU tests requireASCEND_HOME_PATHenvironment variableC20 or later is requiredbfloat16 support requires GCC14 for CPU simulatorPTO instructions are case-sensitive and useTprefix【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考