FFT_1D【免费下载链接】sip本项目是CANN提供的一款高效、可靠的高性能信号处理算子加速库基于华为Ascend AI处理器专门为信号处理领域而设计。项目地址: https://gitcode.com/cann/sip产品支持情况产品是否支持Atlas 200I/500 A2 推理产品×Atlas 推理系列产品×Atlas 训练系列产品×Atlas A3 训练系列产品/Atlas A3 推理系列产品√Atlas A2 训练系列产品/Atlas A2 推理系列产品√Ascend 950PR/Ascend 950DT仅“asdFftExecC2C” 支持功能说明接口功能asdFftMakePlan1D初始化该句柄对应的FFT配置。asdFftExecC2C执行复数到复数的FFT变换。asdFftExecC2R执行复数到实数的FFT变换。asdFftExecR2C执行实数到复数的FFT变换。asdFftExecC2CSeparated执行复数到复数的FFT变换支持实部、虚部分开输入和输出。计算公式傅里叶变换Fourier transform是一种线性积分变换用于信号在时域和频域之间的变换在物理学和工程学中有许多应用。对应给定长度为N的信号其离散形式DFT(Discrete Fourier Transform)表达式如下将系数矩阵(NN)和时域信号(N1)看做两个Tensor在NPU上直接使用矩阵乘可完成DFT但时间复杂度太高因此需要快速傅里叶变换。其基本原理是利用三角函数在复数域的旋转对称性将序列拆分成子序列通过蝶形运算以降低计算的复杂度函数原型AspbStatus asdFftMakePlan1D( asdFftHandle handle, int64_t fftSize, asdFftType fftType, asdFftDirection direction, int64_t batchSize, asdFft1dDimType dimType)AspbStatus asdFftExecC2C( asdFftHandle handle, const aclTensor * input, const aclTensor * output)AspbStatus asdFftExecC2R( asdFftHandle handle, const aclTensor * input, const aclTensor * output)AspbStatus asdFftExecR2C( asdFftHandle handle, const aclTensor * input, const aclTensor * output)AspbStatus asdFftExecC2CSeparated( asdFftHandle handle, const aclTensor * inputReal, const aclTensor * inputImag, const aclTensor * outputReal, const aclTensor * outputImag)asdFftMakePlan1D参数说明参数名输入/输出描述handleasdFftHandle输入算子的句柄需要手动申请创建asdFftHandle对象。fftSizeint64_t输入对应公式中的N表示FFT信号长度。fftTypeasdFftType输入FFT变换类型ASCEND_FFT_C2C复数到复数的快速傅里叶变换。ASCEND_FFT_C2R复数到实数的快速傅里叶变换。ASCEND_FFT_R2C实数到复数的快速傅里叶变换。directionasdFftDirection输入选择FFT执行正向变换或反向变换ASCEND_FFT_FORWARD正向快速傅里叶变换。ASCEND_FFT_INVERSE逆向快速傅里叶变换。batchSizeint64_t输入FFT变换批处理操作中的数据批次数量。dimTypeasdFft1dDimType输入指定Fft_1D变换的维度“方向”是按行做FFT还是按列做FFTASCEND_FFT_HORIZONTAL横向FFT。ASCEND_FFT_VERTICAL纵向FFT。返回值返回状态码具体参见SiP返回码。asdFftExecC2C参数说明参数名输入/输出描述handleasdFftHandle输入算子的句柄需要手动申请创建asdFftHandle对象。inData aclTensor *输入对应公式中的x。数据类型支持COMPLEX64。数据格式支持ND。对横向FFT输入的shape为 batchSizefftSize。对纵向FFT输入的shape为 fftSizebatchSize。outDataaclTensor *输出对应公式中的y。数据类型支持COMPLEX64。数据格式支持ND。对横向FFT输入的shape为 batchSizefftSize。对纵向FFT输入的shape为 fftSizebatchSize。返回值返回状态码具体参见SiP返回码。asdFftExecC2R参数说明参数名输入/输出描述handleasdFftHandle输入算子的句柄需要手动申请创建asdFftHandle对象。inData aclTensor *输入对应公式中的x。数据类型支持COMPLEX64。数据格式支持ND。对横向FFT输入的shape为batchSize fftSize / 2 1。对纵向FFT输入的shape为fftSize / 2 1batchSize。outDataaclTensor *输出对应公式中的y。数据类型支持FLOAT32。数据格式支持ND。对横向FFT输入的shape为batchSizefftSize。对纵向FFT输入的shape为fftSizebatchSize。返回值返回状态码具体参见SiP返回码。asdFftExecR2C参数说明参数名输入/输出描述handleasdFftHandle输入算子的句柄需要手动申请创建asdFftHandle对象。inData aclTensor *输入对应公式中的x。数据类型支持FLOAT32。数据格式支持ND。对横向FFT输入的shape为batchSizefftSize。对纵向FFT输入的shape为fftSizebatchSize 。outDataaclTensor *输出对应公式中的y。数据类型支持COMPLEX64。数据格式支持ND。对横向FFT输入的shape为batchSize fftSize / 2 1。对纵向FFT输入的shape为fftSize / 2 1batchSize。返回值返回状态码具体参见SiP返回码。asdFftExecC2CSeparated参数说明参数名输入/输出描述handleasdFftHandle输入算子的句柄需要手动申请创建asdFftHandle对象。inputReal aclTensor *输入公式中的x的实部。数据类型支持FLOAT32。数据格式支持ND。输入的shape为batchSizefftSize。inputImagaclTensor *输入公式中的x的虚部。数据类型支持FLOAT32。数据格式支持ND。输入的shape为batchSizefftSize。outputRealaclTensor *输出公式中的y的实部。数据类型支持FLOAT32。数据格式支持ND。输入的shape为batchSizefftSize。outputImagaclTensor *输出公式中的y的虚部。数据类型支持FLOAT32。数据格式支持ND。输入的shape为batchSizefftSize。返回值返回状态码具体参见SiP返回码。约束说明asdFftMakePlan1D对横向FFTfftSize需保证不超过$2^{27}$且分解质因数后不包含超过199的质因子。batchSize在存储允许范围内应无额外约束。输入的元素个数理论支持[1$2^{30}$]。当前功能实现所限横向FFT输入长度fftSize大于等于32768且为2的幂的时候会修改输入数据需提前做好备份。对纵向FFTfftSize需保证是2的幂且大于等于256、小于等于65536。batchSize需保证是128的整数倍。输入的元素个数理论支持[1$2^{30}$]。输入的元素不支持inf、-inf和nan如果输入中包含这些值, 那么结果为未定义。asdFftExecC2CSeparated信号长度范围[2, 256]。调用示例示例代码如下该样例旨在提供快速上手、开发和调试算子的最小化实现其核心目标是使用最精简的代码展示算子的核心功能而非提供生产级的安全保障。不推荐用户直接将示例代码作为业务代码若用户将示例代码应用在自身的真实业务场景中且发生了安全问题则需用户自行承担。C2C_1D#include iostream #include vector #include asdsip.h #include acl/acl.h #include aclnn/acl_meta.h using namespace AsdSip; #define CHECK_RET(cond, return_expr) \ do { \ if (!(cond)) { \ return_expr; \ } \ } while (0) #define LOG_PRINT(message, ...) \ do { \ printf(message, ##__VA_ARGS__); \ } while (0) #define ASD_STATUS_CHECK(err) \ do { \ AsdSip::AspbStatus err_ (err); \ if (err_ ! AsdSip::ErrorType::ACL_SUCCESS) { \ std::cout Execute failed. std::endl; \ exit(-1); \ } \ } while (0) int64_t GetShapeSize(const std::vectorint64_t shape) { int64_t shapeSize 1; for (auto i : shape) { shapeSize * i; } return shapeSize; } int Init(int32_t deviceId, aclrtStream *stream) { // 固定写法AscendCL初始化 auto ret aclInit(nullptr); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclInit failed. ERROR: %d\n, ret); return ret); ret aclrtSetDevice(deviceId); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSetDevice failed. ERROR: %d\n, ret); return ret); ret aclrtCreateStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtCreateStream failed. ERROR: %d\n, ret); return ret); return 0; } template typename T int CreateAclTensor(const std::vectorT hostData, const std::vectorint64_t shape, void **deviceAddr, aclDataType dataType, aclTensor **tensor) { auto size GetShapeSize(shape) * sizeof(T); // 调用aclrtMalloc申请device侧内存 auto ret aclrtMalloc(deviceAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMalloc failed. ERROR: %d\n, ret); return ret); // 调用aclrtMemcpy将host侧数据复制到device侧内存上 ret aclrtMemcpy(*deviceAddr, size, hostData.data(), size, ACL_MEMCPY_HOST_TO_DEVICE); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMemcpy failed. ERROR: %d\n, ret); return ret); // 计算连续tensor的strides std::vectorint64_t strides(shape.size(), 1); for (int64_t i shape.size() - 2; i 0; i--) { strides[i] shape[i 1] * strides[i 1]; } // 调用aclCreateTensor接口创建aclTensor *tensor aclCreateTensor(shape.data(), shape.size(), dataType, strides.data(), 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), *deviceAddr); return 0; } int main() { int32_t deviceId 0; aclrtStream stream; auto ret Init(deviceId, stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(Init acl failed. ERROR: %d\n, ret); return ret); // 创造tensor的Host侧数据 int batch 32, Nfft 128; // c2c dft // int batch 32, Nfft 8192; // c2c fftb // int batch 32, Nfft 15000; // c2c mixed // int batch 32, Nfft 32768; // c2c fftn // int batch 32, Nfft 199 * 199; // core any const int64_t tensorInSize batch * Nfft; std::vectorint64_t selfShape {batch, Nfft}; std::vectorint64_t outShape {batch, Nfft}; std::vectorstd::complexfloat inputHostData(tensorInSize, std::complexfloat(0, 0)); for (int i 0; i tensorInSize; i) { inputHostData[i] std::complexfloat(i, i 1); } std::vectorstd::complexfloat outHostData(tensorInSize, std::complexfloat(0, 0)); void *inputDeviceAddr nullptr; void *outDeviceAddr nullptr; aclTensor *input nullptr; aclTensor *out nullptr; ret CreateAclTensor(inputHostData, selfShape, inputDeviceAddr, aclDataType::ACL_COMPLEX64, input); CHECK_RET(ret ::ACL_SUCCESS, return ret); ret CreateAclTensor(outHostData, outShape, outDeviceAddr, aclDataType::ACL_COMPLEX64, out); CHECK_RET(ret ::ACL_SUCCESS, return ret); asdFftHandle handle; asdFftCreate(handle); asdFftMakePlan1D(handle, Nfft, asdFftType::ASCEND_FFT_C2C, asdFftDirection::ASCEND_FFT_FORWARD, batch); size_t work_size; asdFftGetWorkspaceSize(handle, work_size); void *workspaceAddr nullptr; if (work_size 0) { ret aclrtMalloc(workspaceAddr, static_castint64_t(work_size), ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(allocate workspace failed. ERROR: %d\n, ret); return ret); } asdFftSetWorkspace(handle, (uint8_t *)workspaceAddr); asdFftSetStream(handle, stream); ASD_STATUS_CHECK(asdFftExecC2C(handle, input, out)); ret aclrtSynchronizeStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSynchronizeStream failed. ERROR: %d\n, ret); return ret); asdFftDestroy(handle); auto size GetShapeSize(outShape); std::vectorstd::complexfloat outData(size, 0); ret aclrtMemcpy(outData.data(), outData.size() * sizeof(outData[0]), outDeviceAddr, size * sizeof(outData[0]), ACL_MEMCPY_DEVICE_TO_HOST); // 打印输出tensor值中前16个 for (int64_t i 0; i 16; i) { std::cout static_caststd::complexfloat(outData[i]) \t; } std::cout \nend result std::endl; std::cout Execute successfully. std::endl; aclDestroyTensor(input); aclDestroyTensor(out); aclrtFree(inputDeviceAddr); aclrtFree(outDeviceAddr); if (work_size 0) { aclrtFree(workspaceAddr); } aclrtDestroyStream(stream); aclrtResetDevice(deviceId); aclFinalize(); return 0; }C2R_1D#include iostream #include vector #include asdsip.h #include acl/acl.h #include aclnn/acl_meta.h using namespace AsdSip; #define CHECK_RET(cond, return_expr) \ do { \ if (!(cond)) { \ return_expr; \ } \ } while (0) #define LOG_PRINT(message, ...) \ do { \ printf(message, ##__VA_ARGS__); \ } while (0) #define ASD_STATUS_CHECK(err) \ do { \ AsdSip::AspbStatus err_ (err); \ if (err_ ! AsdSip::ErrorType::ACL_SUCCESS) { \ std::cout Execute failed. std::endl; \ exit(-1); \ } \ } while (0) int64_t GetShapeSize(const std::vectorint64_t shape) { int64_t shapeSize 1; for (auto i : shape) { shapeSize * i; } return shapeSize; } int Init(int32_t deviceId, aclrtStream *stream) { // 固定写法AscendCL初始化 auto ret aclInit(nullptr); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclInit failed. ERROR: %d\n, ret); return ret); ret aclrtSetDevice(deviceId); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSetDevice failed. ERROR: %d\n, ret); return ret); ret aclrtCreateStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtCreateStream failed. ERROR: %d\n, ret); return ret); return 0; } template typename T int CreateAclTensor(const std::vectorT hostData, const std::vectorint64_t shape, void **deviceAddr, aclDataType dataType, aclTensor **tensor) { auto size GetShapeSize(shape) * sizeof(T); // 调用aclrtMalloc申请device侧内存 auto ret aclrtMalloc(deviceAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMalloc failed. ERROR: %d\n, ret); return ret); // 调用aclrtMemcpy将host侧数据复制到device侧内存上 ret aclrtMemcpy(*deviceAddr, size, hostData.data(), size, ACL_MEMCPY_HOST_TO_DEVICE); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMemcpy failed. ERROR: %d\n, ret); return ret); // 计算连续tensor的strides std::vectorint64_t strides(shape.size(), 1); for (int64_t i shape.size() - 2; i 0; i--) { strides[i] shape[i 1] * strides[i 1]; } // 调用aclCreateTensor接口创建aclTensor *tensor aclCreateTensor(shape.data(), shape.size(), dataType, strides.data(), 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), *deviceAddr); return 0; } int main() { int32_t deviceId 0; aclrtStream stream; auto ret Init(deviceId, stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(Init acl failed. ERROR: %d\n, ret); return ret); // 创造tensor的Host侧数据 int batch 32, Nfft 128; // int batch 32, Nfft 8192; // int batch 8, Nfft 567; // int batch 32, Nfft 997; // int batch 32, Nfft 15000; // 创造tensor的Host侧数据 // int batch 32, Nfft 199 * 199; const int64_t inSignal Nfft / 2 1; const int64_t outSignal Nfft; const int64_t tensorInSize batch * inSignal; const int64_t tensorOutSize batch * outSignal; std::vectorint64_t selfShape {batch, inSignal}; std::vectorint64_t outShape {batch, outSignal}; std::vectorstd::complexfloat inputHostData(tensorInSize, std::complexfloat(0, 0)); for (int i 0; i tensorInSize; i) { inputHostData[i] std::complexfloat(i, i 1); } std::vectorfloat outHostData(tensorOutSize, 0); void *inputDeviceAddr nullptr; void *outDeviceAddr nullptr; aclTensor *input nullptr; aclTensor *out nullptr; ret CreateAclTensor(inputHostData, selfShape, inputDeviceAddr, aclDataType::ACL_COMPLEX64, input); CHECK_RET(ret ::ACL_SUCCESS, return ret); ret CreateAclTensor(outHostData, outShape, outDeviceAddr, aclDataType::ACL_FLOAT, out); CHECK_RET(ret ::ACL_SUCCESS, return ret); asdFftHandle handle; asdFftCreate(handle); asdFftMakePlan1D(handle, Nfft, asdFftType::ASCEND_FFT_C2R, asdFftDirection::ASCEND_FFT_FORWARD, batch); size_t work_size; asdFftGetWorkspaceSize(handle, work_size); void *workspaceAddr nullptr; if (work_size 0) { ret aclrtMalloc(workspaceAddr, static_castint64_t(work_size), ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(allocate workspace failed. ERROR: %d\n, ret); return ret); } asdFftSetWorkspace(handle, (uint8_t *)workspaceAddr); asdFftSetStream(handle, stream); ASD_STATUS_CHECK(asdFftExecC2R(handle, input, out)); ret aclrtSynchronizeStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSynchronizeStream failed. ERROR: %d\n, ret); return ret); asdFftDestroy(handle); auto size GetShapeSize(outShape); std::vectorfloat outData(size, 0); ret aclrtMemcpy(outData.data(), outData.size() * sizeof(outData[0]), outDeviceAddr, size * sizeof(outData[0]), ACL_MEMCPY_DEVICE_TO_HOST); // 打印输出tensor值中前16个 for (int64_t i 0; i 16; i) { std::cout static_castfloat(outData[i]) \t; } std::cout \nend result std::endl; std::cout Execute successfully. std::endl; aclDestroyTensor(input); aclDestroyTensor(out); aclrtFree(inputDeviceAddr); aclrtFree(outDeviceAddr); if (work_size 0) { aclrtFree(workspaceAddr); } aclrtDestroyStream(stream); aclrtResetDevice(deviceId); aclFinalize(); return 0; }R2C_1D#include iostream #include vector #include asdsip.h #include acl/acl.h #include aclnn/acl_meta.h using namespace AsdSip; #define CHECK_RET(cond, return_expr) \ do { \ if (!(cond)) { \ return_expr; \ } \ } while (0) #define LOG_PRINT(message, ...) \ do { \ printf(message, ##__VA_ARGS__); \ } while (0) #define ASD_STATUS_CHECK(err) \ do { \ AsdSip::AspbStatus err_ (err); \ if (err_ ! AsdSip::ErrorType::ACL_SUCCESS) { \ std::cout Execute failed. std::endl; \ exit(-1); \ } \ } while (0) int64_t GetShapeSize(const std::vectorint64_t shape) { int64_t shapeSize 1; for (auto i : shape) { shapeSize * i; } return shapeSize; } int Init(int32_t deviceId, aclrtStream *stream) { // 固定写法AscendCL初始化 auto ret aclInit(nullptr); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclInit failed. ERROR: %d\n, ret); return ret); ret aclrtSetDevice(deviceId); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSetDevice failed. ERROR: %d\n, ret); return ret); ret aclrtCreateStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtCreateStream failed. ERROR: %d\n, ret); return ret); return 0; } template typename T int CreateAclTensor(const std::vectorT hostData, const std::vectorint64_t shape, void **deviceAddr, aclDataType dataType, aclTensor **tensor) { auto size GetShapeSize(shape) * sizeof(T); // 调用aclrtMalloc申请device侧内存 auto ret aclrtMalloc(deviceAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMalloc failed. ERROR: %d\n, ret); return ret); // 调用aclrtMemcpy将host侧数据复制到device侧内存上 ret aclrtMemcpy(*deviceAddr, size, hostData.data(), size, ACL_MEMCPY_HOST_TO_DEVICE); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMemcpy failed. ERROR: %d\n, ret); return ret); // 计算连续tensor的strides std::vectorint64_t strides(shape.size(), 1); for (int64_t i shape.size() - 2; i 0; i--) { strides[i] shape[i 1] * strides[i 1]; } // 调用aclCreateTensor接口创建aclTensor *tensor aclCreateTensor(shape.data(), shape.size(), dataType, strides.data(), 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), *deviceAddr); return 0; } int main() { int32_t deviceId 0; aclrtStream stream; auto ret Init(deviceId, stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(Init acl failed. ERROR: %d\n, ret); return ret); // 创造tensor的Host侧数据 int batch 32, Nfft 256; // int batch 32, Nfft 199 * 199; const int64_t inSignal Nfft; const int64_t outSignal Nfft / 2 1; const int64_t tensorInSize batch * inSignal; const int64_t tensorOutSize batch * outSignal; std::vectorint64_t selfShape {batch, inSignal}; std::vectorint64_t outShape {batch, outSignal}; std::vectorfloat inputHostData(tensorInSize, 0); for (int i 0; i tensorInSize; i) { inputHostData[i] i; } std::vectorstd::complexfloat outHostData(tensorOutSize, std::complexfloat(0, 0)); void *inputDeviceAddr nullptr; void *outDeviceAddr nullptr; aclTensor *input nullptr; aclTensor *out nullptr; ret CreateAclTensor(inputHostData, selfShape, inputDeviceAddr, aclDataType::ACL_FLOAT, input); CHECK_RET(ret ::ACL_SUCCESS, return ret); ret CreateAclTensor(outHostData, outShape, outDeviceAddr, aclDataType::ACL_COMPLEX64, out); CHECK_RET(ret ::ACL_SUCCESS, return ret); asdFftHandle handle; asdFftCreate(handle); asdFftMakePlan1D(handle, Nfft, asdFftType::ASCEND_FFT_R2C, asdFftDirection::ASCEND_FFT_FORWARD, batch); size_t work_size; asdFftGetWorkspaceSize(handle, work_size); void *workspaceAddr nullptr; if (work_size 0) { ret aclrtMalloc(workspaceAddr, static_castint64_t(work_size), ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(allocate workspace failed. ERROR: %d\n, ret); return ret); } asdFftSetWorkspace(handle, (uint8_t *)workspaceAddr); asdFftSetStream(handle, stream); ASD_STATUS_CHECK(asdFftExecR2C(handle, input, out)); ret aclrtSynchronizeStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSynchronizeStream failed. ERROR: %d\n, ret); return ret); asdFftDestroy(handle); auto size GetShapeSize(outShape); std::vectorstd::complexfloat outData(size, 0); ret aclrtMemcpy(outData.data(), outData.size() * sizeof(outData[0]), outDeviceAddr, size * sizeof(outData[0]), ACL_MEMCPY_DEVICE_TO_HOST); // 打印输出tensor值中前16个 for (int64_t i 0; i 16; i) { std::cout static_caststd::complexfloat(outData[i]) \t; } std::cout \nend result std::endl; std::cout Execute successfully. std::endl; aclDestroyTensor(input); aclDestroyTensor(out); aclrtFree(inputDeviceAddr); aclrtFree(outDeviceAddr); if (work_size 0) { aclrtFree(workspaceAddr); } aclrtDestroyStream(stream); aclrtResetDevice(deviceId); aclFinalize(); return 0; }C2C_1D_SEP#include iostream #include vector #include asdsip.h #include acl/acl.h #include aclnn/acl_meta.h using namespace AsdSip; #define CHECK_RET(cond, return_expr) \ do { \ if (!(cond)) { \ return_expr; \ } \ } while (0) #define LOG_PRINT(message, ...) \ do { \ printf(message, ##__VA_ARGS__); \ } while (0) #define ASD_STATUS_CHECK(err) \ do { \ AsdSip::AspbStatus err_ (err); \ if (err_ ! AsdSip::ErrorType::ACL_SUCCESS) { \ std::cout Execute failed. std::endl; \ exit(-1); \ } \ } while (0) int64_t GetShapeSize(const std::vectorint64_t shape) { int64_t shapeSize 1; for (auto i : shape) { shapeSize * i; } return shapeSize; } int Init(int32_t deviceId, aclrtStream *stream) { // 固定写法AscendCL初始化 auto ret aclInit(nullptr); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclInit failed. ERROR: %d\n, ret); return ret); ret aclrtSetDevice(deviceId); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSetDevice failed. ERROR: %d\n, ret); return ret); ret aclrtCreateStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtCreateStream failed. ERROR: %d\n, ret); return ret); return 0; } template typename T int CreateAclTensor(const std::vectorT hostData, const std::vectorint64_t shape, void **deviceAddr, aclDataType dataType, aclTensor **tensor) { auto size GetShapeSize(shape) * sizeof(T); // 调用aclrtMalloc申请device侧内存 auto ret aclrtMalloc(deviceAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMalloc failed. ERROR: %d\n, ret); return ret); // 调用aclrtMemcpy将host侧数据复制到device侧内存上 ret aclrtMemcpy(*deviceAddr, size, hostData.data(), size, ACL_MEMCPY_HOST_TO_DEVICE); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtMemcpy failed. ERROR: %d\n, ret); return ret); // 计算连续tensor的strides std::vectorint64_t strides(shape.size(), 1); for (int64_t i shape.size() - 2; i 0; i--) { strides[i] shape[i 1] * strides[i 1]; } // 调用aclCreateTensor接口创建aclTensor *tensor aclCreateTensor(shape.data(), shape.size(), dataType, strides.data(), 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), *deviceAddr); return 0; } int main() { int32_t deviceId 0; aclrtStream stream; auto ret Init(deviceId, stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(Init acl failed. ERROR: %d\n, ret); return ret); // 创造tensor的Host侧数据 int batch 32, Nfft 256; // c2c dft // int batch 32, Nfft 8192; // c2c fftb // int batch 32, Nfft 15000; // c2c mixed // int batch 32, Nfft 32768; // c2c fftn // int batch 32, Nfft 199 * 199; // core any const int64_t tensorInSize batch * Nfft; std::vectorint64_t selfShape {batch, Nfft}; std::vectorint64_t outShape {batch, Nfft}; std::vectorfloat inputRealHostData(tensorInSize, 0); std::vectorfloat inputImagHostData(tensorInSize, 0); std::vectorfloat outputRealHostData(tensorInSize, 0); std::vectorfloat outputImagHostData(tensorInSize, 0); for (int i 0; i tensorInSize; i) { inputRealHostData[i] i; inputImagHostData[i] i 1; } void *inputRealDeviceAddr nullptr; void *inputImagDeviceAddr nullptr; void *outputRealDeviceAddr nullptr; void *outputImagDeviceAddr nullptr; aclTensor *inputReal nullptr; aclTensor *inputImag nullptr; aclTensor *outputReal nullptr; aclTensor *outputImag nullptr; ret CreateAclTensor(inputRealHostData, selfShape, inputRealDeviceAddr, aclDataType::ACL_FLOAT, inputReal); CHECK_RET(ret ::ACL_SUCCESS, return ret); ret CreateAclTensor(inputImagHostData, selfShape, inputImagDeviceAddr, aclDataType::ACL_FLOAT, inputImag); CHECK_RET(ret ::ACL_SUCCESS, return ret); ret CreateAclTensor(outputRealHostData, outShape, outputRealDeviceAddr, aclDataType::ACL_FLOAT, outputReal); CHECK_RET(ret ::ACL_SUCCESS, return ret); ret CreateAclTensor(outputImagHostData, outShape, outputImagDeviceAddr, aclDataType::ACL_FLOAT, outputImag); CHECK_RET(ret ::ACL_SUCCESS, return ret); asdFftHandle handle; asdFftCreate(handle); asdFftMakePlan1D(handle, Nfft, asdFftType::ASCEND_FFT_C2C_SEP, asdFftDirection::ASCEND_FFT_FORWARD, batch); size_t work_size; asdFftGetWorkspaceSize(handle, work_size); void *workspaceAddr nullptr; if (work_size 0) { ret aclrtMalloc(workspaceAddr, static_castint64_t(work_size), ACL_MEM_MALLOC_HUGE_FIRST); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(allocate workspace failed. ERROR: %d\n, ret); return ret); } asdFftSetWorkspace(handle, (uint8_t *)workspaceAddr); asdFftSetStream(handle, stream); ASD_STATUS_CHECK(asdFftExecC2CSeparated(handle, inputReal, inputImag, outputReal, outputImag)); ret aclrtSynchronizeStream(stream); CHECK_RET(ret ::ACL_SUCCESS, LOG_PRINT(aclrtSynchronizeStream failed. ERROR: %d\n, ret); return ret); asdFftDestroy(handle); auto size GetShapeSize(outShape); std::vectorfloat outRealData(size, 0); std::vectorfloat outImagData(size, 0); ret aclrtMemcpy(outRealData.data(), outRealData.size() * sizeof(outRealData[0]), outputRealDeviceAddr, size * sizeof(outRealData[0]), ACL_MEMCPY_DEVICE_TO_HOST); ret aclrtMemcpy(outImagData.data(), outImagData.size() * sizeof(outImagData[0]), outputImagDeviceAddr, size * sizeof(outImagData[0]), ACL_MEMCPY_DEVICE_TO_HOST); // 打印输出tensor值中前16个 std::cout real part: std::endl; for (int64_t i 0; i 16; i) { for (int64_t j 0; j 16; j) { std::cout static_castfloat(outRealData[i * Nfft j]) \t; } std::cout std::endl; } std::cout \nimag part: std::endl; for (int64_t i 0; i 16; i) { for (int64_t j 0; j 16; j) { std::cout static_castfloat(outImagData[i * Nfft j]) \t; } std::cout std::endl; } std::cout \nend result std::endl; std::cout Execute successfully. std::endl; aclDestroyTensor(inputReal); aclDestroyTensor(inputImag); aclDestroyTensor(outputReal); aclDestroyTensor(outputImag); aclrtFree(inputRealDeviceAddr); aclrtFree(inputImagDeviceAddr); aclrtFree(outputRealDeviceAddr); aclrtFree(outputImagDeviceAddr); if (work_size 0) { aclrtFree(workspaceAddr); } aclrtDestroyStream(stream); aclrtResetDevice(deviceId); aclFinalize(); return 0; }【免费下载链接】sip本项目是CANN提供的一款高效、可靠的高性能信号处理算子加速库基于华为Ascend AI处理器专门为信号处理领域而设计。项目地址: https://gitcode.com/cann/sip创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考