当前位置: 首页 > news >正文

昇腾910使用记录

一. 压缩文件和解压文件

1. 压缩文件

tar -czvf UNITE-main.tar.gz ./UNITE-main/

2. 解压文件

tar -xvf ./UNITE-main/

二. CUDA更改为NPU

data['label'] = data['label'].cuda()
data['instance'] = data['instance'].cuda()
data['image'] = data['image'].cuda()

更改为

data['label'] = data['label'].npu()
data['instance'] = data['instance'].npu()
data['image'] = data['image'].npu()

三. 配置环境变量

1. 创建env.sh

touch env.sh

2. 打开env.sh

vi env.sh

3. 配置环境变量

# 配置CANN相关环境变量
CANN_INSTALL_PATH_CONF='/etc/Ascend/ascend_cann_install.info'
if [ -f $CANN_INSTALL_PATH_CONF ]; thenDEFAULT_CANN_INSTALL_PATH=$(cat $CANN_INSTALL_PATH_CONF | grep Install_Path | cut -d "=" -f 2)
elseDEFAULT_CANN_INSTALL_PATH="/usr/local/Ascend/"
fi
CANN_INSTALL_PATH=${1:-${DEFAULT_CANN_INSTALL_PATH}}
if [ -d ${CANN_INSTALL_PATH}/ascend-toolkit/latest ];thensource ${CANN_INSTALL_PATH}/ascend-toolkit/set_env.sh
elsesource ${CANN_INSTALL_PATH}/nnae/set_env.sh
fi
# 导入依赖库
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/openblas/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib64/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/aarch64_64-linux-gnu
# 配置自定义环境变量
export HCCL_WHITELIST_DISABLE=1
# log
export ASCEND_SLOG_PRINT_TO_STDOUT=0 # 日志打屏, 可选
export ASCEND_GLOBAL_LOG_LEVEL=3 # 日志级别常用 1 INFO级别; 3 ERROR级别
export ASCEND_GLOBAL_EVENT_ENABLE=0 # 默认不使能event日志信息

并输入

:wq!

4. 使用环境

source env.sh

四. RuntimeError: ACL stream synchronize failed, error code:507018

E39999: Inner Error, Please contact support engineer!
E39999  Aicpu kernel execute failed, device_id=0, stream_id=0, task_id=6394, fault op_name=ScatterElements[FUNC:GetError][FILE:stream.cc][LINE:1044]TraceBack (most recent call last):rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]DEVICE[0] PID[41411]: 
EXCEPTION TASK:Exception info:TGID=2593324, model id=65535, stream id=0, stream phase=SCHEDULE, task id=742, task type=aicpu kernel, recently received task id=742, recently send task id=741, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-11:22:01.273.951, function=proc_aicpu_task_done, line=972, error code=0x2a 
EXCEPTION TASK:Exception info:TGID=2593324, model id=65535, stream id=0, stream phase=3, task id=6394, task type=aicpu kernel, recently received task id=6406, recently send task id=6393, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-11:41:20.661.958, function=proc_aicpu_task_done, line=972, error code=0x2a
Traceback (most recent call last):File "train.py", line 40, in <module>trainer.run_generator_one_step(data_i)File "/home/ma-user/work/SPADE-master/trainers/pix2pix_trainer.py", line 35, in run_generator_one_stepg_losses, generated = self.pix2pix_model(data, mode='generator')File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_implreturn forward_call(*input, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forwardreturn self.module(*inputs, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_implreturn forward_call(*input, **kwargs)File "/home/ma-user/work/SPADE-master/models/pix2pix_model.py", line 43, in forwardinput_semantics, real_image = self.preprocess_input(data)File "/home/ma-user/work/SPADE-master/models/pix2pix_model.py", line 113, in preprocess_inputdata['label'] = data['label'].npu()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch_npu/utils/device_guard.py", line 38, in wrapperreturn func(*args, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch_npu/utils/tensor_methods.py", line 66, in _npureturn torch_npu._C.npu(self, *args, **kwargs)
RuntimeError: ACL stream synchronize failed, error code:507018
THPModule_npu_shutdown success.

猜测可能是没有开混合精度

五. 开启混合精度

1. 在构建神经网络前,我们需要导入torch_npu中的AMP模块

import time
import torch
import torch.nn as nn
import torch_npu
from torch_npu.npu import amp    # 导入AMP模块

2. 在模型、优化器定义之后,定义AMP功能中的GradScaler

model = CNN().to(device)
train_dataloader = DataLoader(train_data, batch_size=batch_size)    # 定义DataLoader
loss_func = nn.CrossEntropyLoss().to(device)    # 定义损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)    # 定义优化器
scaler = amp.GradScaler()    # 在模型、优化器定义之后,定义GradScaler

3. 在训练代码中添加AMP功能相关的代码开启AMP

for epo in range(epochs):
for imgs, labels in train_dataloader:
imgs = imgs.to(device)labels = labels.to(device)with amp.autocast():outputs = model(imgs)    # 前向计算loss = loss_func(outputs, labels)    # 损失函数计算optimizer.zero_grad()# 进行反向传播前后的loss缩放、参数更新scaler.scale(loss).backward()    # loss缩放并反向传播scaler.step(optimizer)    # 更新参数(自动unscaling)scaler.update()    # 基于动态Loss Scale更新loss_scaling系数 

六. 未知错误

E39999: Inner Error, Please contact support engineer!
E39999  An exception occurred during AICPU execution, stream_id:78, task_id:742, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:673]TraceBack (most recent call last):Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task.cc][LINE:1068]Aicpu kernel execute failed, device_id=0, stream_id=78, task_id=742.[FUNC:PrintAicpuErrorInfo][FILE:task.cc][LINE:774]Aicpu kernel execute failed, device_id=0, stream_id=78, task_id=742, fault op_name=ScatterElements[FUNC:GetError][FILE:stream.cc][LINE:1044]rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]op[Minimum], The Minimum op dtype is not same, type1:DT_FLOAT16, type2:DT_FLOAT[FUNC:CheckTwoInputDtypeSame][FILE:util.cc][LINE:116]Verifying Minimum failed.[FUNC:InferShapeAndType][FILE:infershape_pass.cc][LINE:135]Call InferShapeAndType for node:Minimum(Minimum) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117]process pass InferShapePass on node:Minimum failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:530]build graph failed, graph id:894, ret:1343242270[FUNC:BuildModel][FILE:ge_generator.cc][LINE:1484][Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]DEVICE[0] PID[189368]: 
EXCEPTION TASK:Exception info:TGID=3114744, model id=65535, stream id=78, stream phase=SCHEDULE, task id=742, task type=aicpu kernel, recently received task id=742, recently send task id=741, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-12:12:22.763.259, function=proc_aicpu_task_done, line=972, error code=0x2a 
EXCEPTION TASK:Exception info:TGID=3114744, model id=65535, stream id=78, stream phase=3, task id=4347, task type=aicpu kernel, recently received task id=4354, recently send task id=4346, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-12:13:57.997.757, function=proc_aicpu_task_done, line=972, error code=0x2a
Aborted (core dumped)
(py38) [ma-user SPADE-master]$Process ForkServerProcess-2:
Traceback (most recent call last):File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrapself.run()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in runself._target(*self._args, **self._kwargs)File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 61, in wrapperraise expFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 58, in wrapperfunc(*args, **kwargs)File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 268, in task_distributekey, func_name, detail = resource_proxy[TASK_QUEUE].get()File "<string>", line 2, in getFile "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethodkind, result = conn.recv()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 250, in recvbuf = self._recv_bytes()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytesbuf = self._recv(4)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 383, in _recvraise EOFError
EOFError
/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 91 leaked semaphore objects to clean up at shutdownwarnings.warn('resource_tracker: There appear to be %d '

参考链接1
参考链接2:昇腾官网

相关文章:

昇腾910使用记录

一. 压缩文件和解压文件 1. 压缩文件 tar -czvf UNITE-main.tar.gz ./UNITE-main/2. 解压文件 tar -xvf ./UNITE-main/二. CUDA更改为NPU data[label] data[label].cuda() data[instance] data[instance].cuda() data[image] data[image].cuda()更改为 data[label] da…...

从一部iPhone手机看芯片的分类

目录 问题 iPhone X 手机处理器&#xff1a;A11 iPhone X 的两大存储芯片 数字 IC CPU&#xff1a;计算设备的运算核心和控制核心 GPU&#xff1a;图形处理器 ASIC&#xff1a;为解决特定应用问题而定制设计的集成电路 存储芯片&#xff1a;DRAM 和 NAND Flash iPhone…...

arm day 7

完成字符串收发函数的封装并且验证现象&#xff0c;一个字符串发送接受后会有‘\n’ \r src/uart.c #include"uart.h"void uart4_init() {//设置UART4的RCc时钟使能//RCC_MP_APB1ENSETR[16]->1RCC->MP_APB1ENSETR | (0x1<<16);//设置GPIOB和GPIOG的时钟…...

Java基础面试-面向对象

什么是面向对象&#xff1f; 对比面向过程&#xff0c;是两种不同的处理问题角度 面向过程更注重事情的每一个步骤及顺序&#xff0c;面向对象更注重事情有哪些参与者&#xff08;对象&#xff09;&#xff0c;及各自需要做什么 比如洗衣机洗衣服 面向过程会将任务拆解成一系…...

GCC vs. G++:C 与 C++ 编译器的差异和比较

本文将介绍 GCC&#xff08;GNU Compiler Collection&#xff09;和 G 编译器的区别&#xff0c;并对它们在 C 和 C 程序开发中的特性和用法进行比较和总结。 引言 在 C 和 C 程序开发中&#xff0c;选择合适的编译器是至关重要的。GCC&#xff08;GNU Compiler Collection&a…...

MAC m系列docker login报错

错误&#xff1a;ERROR: failed to solve: XXX error getting credentials - err: exit status 1, out: 解决&#xff1a; vi ~/.docker/config.jsonzsxzsx [15时55分55秒] [~] { {"auths": {"harbor-g42c.corp.matrx.team": {"auth": "…...

Redis通用指令和五大基本数据类型常用指令总结

通用指令 keys parttern 查询key (parttern即通配符&#xff0c;不是正则表达式&#xff0c;例如 keys a? 匹配以a开头的长度为2的key) del key 删除key exists key 获取key是否存在 type key 获取key的类型 expire key seconds 为指定key设置有效期&#xff0c;单位秒 …...

uCharts常用图表组件demo

带渐变阴影的曲线图 <view class"charts-box"><qiun-data-charts type"area" :opts"opts" :chartData"chartData" :ontouch"true":background"rgba(256,256,256,0)" /> </view>data(){return{…...

VNC:Timed out waiting for a response from the computer

VNC的服务端使用的是TigerVNC&#xff0c;客户端使用的是RealVNC TigerVNC按其他博客配好后&#xff0c;防火墙ip什么的都配了&#xff0c;vnc客户端怎么连都是超时。 这里建议大家可以尝试一下重启服务器。我的是CentOS的 shutdown -r now 配了2天&#xff0c;最后服务器重启…...

Kotlin 协程 知识点

Android 上的 Kotlin 协程 | Android Developers (google.cn) 官方网址 1.什么是协程&#xff1f; 我觉得协程就是kotlin中一种优雅的实现异步请求 协程&#xff08;Coroutines&#xff09;是一种轻量级的并发编程概念&#xff0c;旨在简化异步编程和并发任务的处理。它是…...

简单大方的自我介绍 PPT 格式

自我介绍是展示自己的机会&#xff0c;同时也是展现自信和魅力的重要时刻。通过简单大方的PPT格式&#xff0c;可以更好地展示自己的个性和才华。下面是一些建议&#xff0c;帮助你在自我介绍中展现自信和魅力。 1. 打造简洁而有吸引力的PPT布局&#xff1a; - 选择简洁大方的背…...

panads操作excel

panads简介 pandas是基于Numpy创建的Python包&#xff0c;内置了大量标准函数&#xff0c;能够高效地解决数据分析数据处理和分析任务&#xff0c;pandas支持多种文件的操作&#xff0c;比如Excel&#xff0c;csv&#xff0c;json&#xff0c;txt 文件等&#xff0c;读取文件之…...

【MySQL】联合查询、子查询、合并查询

这里提供了三个表&#xff1a; 表1&#xff1a; mysql> select * from class; -------------- | id | name | -------------- | 1 | 一班 | | 2 | 二班 | | 3 | 三班 | -------------- 3 rows in set (0.01 sec) 表2&#xff1a; mysql> select * fro…...

小程序中如何设置所服务地区的时区

在全球化的背景下&#xff0c;小程序除了在中国使用外&#xff0c;还为海外的华人地区提供服务。例如我们采云小程序为泰国、阿根廷、缅甸等国家的商家就提供过微信小程序。这些商家开通小程序&#xff0c;为本地的华人提供服务。但通常小程序的开发者/服务商位于中国&#xff…...

Linux环境安装mysql8.0

1个人习惯我喜欢给软件安装在/use/local下&#xff0c;我使用的finalshell软件&#xff0c;直接手动新建一个文件夹名字为mysql 2下载mysql wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-glibc2.12-x86_64.tar.xz 3解压文件 tar -xvf mysql-8.0.2…...

STM32_DMA_多通道采集ADC出现错位现象

STM32_DMA_多通道采集ADC出现错位现象 问题描述&#xff1a; adcSensorValue[0],adcSensorValue[3],adcSensorValue[6]… //存储通道1数据 adcSensorValue[1],adcSensorValue[4],adcSensorValue[7]… //存储通道2数据 adcSensorValue[2],adcSensorValue[5],adcSensorValue[8]……...

Linux内存管理 (2):memblock 子系统的建立

前一篇&#xff1a;Linux内存管理 (1)&#xff1a;内核镜像映射临时页表的建立 文章目录 1. 前言2. 分析背景3. memblock 简介3.1 memblock 数据结构3.2 memblock 接口 4. memblock 的构建过程 1. 前言 限于作者能力水平&#xff0c;本文可能存在谬误&#xff0c;因此而给读者…...

创新学习方式,电大搜题助您迈向成功之路

近年来&#xff0c;随着信息技术的发展&#xff0c;互联网在教育领域发挥的作用越来越显著。贵州开放大学作为国内首家电视大学&#xff0c;一直致力于创新教学模式&#xff0c;帮助学生更好地获取知识。在学习过程中&#xff0c;学生常常遇到疑难问题&#xff0c;而解决这些问…...

Mybatis整理

Mybatis 定义 Mybatis是一个半ORM&#xff08;对象关系映射&#xff09;框架&#xff0c;它内部封装了JDBC&#xff0c;加载驱动、创建连接、创建statement等繁杂的过程&#xff0c;开发者开发时只需要关注如何编写SQL语句&#xff0c;可以严格控制sql执行性能&#xff0c;灵…...

pytorch定义datase多次重复采样

有的时候训练需要对样本重复抽样为一个batch&#xff0c;可以按如下格式定义: class TrainLoader(Dataset):def __init__(self, fns, repeat1):super(TrainLoader, self).__init__()self.length len(fns) # 数据数量self.repeat repeat # 数据重复次数def __getitem__(self,…...

龙虎榜——20250610

上证指数放量收阴线&#xff0c;个股多数下跌&#xff0c;盘中受消息影响大幅波动。 深证指数放量收阴线形成顶分型&#xff0c;指数短线有调整的需求&#xff0c;大概需要一两天。 2025年6月10日龙虎榜行业方向分析 1. 金融科技 代表标的&#xff1a;御银股份、雄帝科技 驱动…...

业务系统对接大模型的基础方案:架构设计与关键步骤

业务系统对接大模型&#xff1a;架构设计与关键步骤 在当今数字化转型的浪潮中&#xff0c;大语言模型&#xff08;LLM&#xff09;已成为企业提升业务效率和创新能力的关键技术之一。将大模型集成到业务系统中&#xff0c;不仅可以优化用户体验&#xff0c;还能为业务决策提供…...

Mybatis逆向工程,动态创建实体类、条件扩展类、Mapper接口、Mapper.xml映射文件

今天呢&#xff0c;博主的学习进度也是步入了Java Mybatis 框架&#xff0c;目前正在逐步杨帆旗航。 那么接下来就给大家出一期有关 Mybatis 逆向工程的教学&#xff0c;希望能对大家有所帮助&#xff0c;也特别欢迎大家指点不足之处&#xff0c;小生很乐意接受正确的建议&…...

Docker 运行 Kafka 带 SASL 认证教程

Docker 运行 Kafka 带 SASL 认证教程 Docker 运行 Kafka 带 SASL 认证教程一、说明二、环境准备三、编写 Docker Compose 和 jaas文件docker-compose.yml代码说明&#xff1a;server_jaas.conf 四、启动服务五、验证服务六、连接kafka服务七、总结 Docker 运行 Kafka 带 SASL 认…...

渗透实战PortSwigger靶场-XSS Lab 14:大多数标签和属性被阻止

<script>标签被拦截 我们需要把全部可用的 tag 和 event 进行暴力破解 XSS cheat sheet&#xff1a; https://portswigger.net/web-security/cross-site-scripting/cheat-sheet 通过爆破发现body可以用 再把全部 events 放进去爆破 这些 event 全部可用 <body onres…...

ServerTrust 并非唯一

NSURLAuthenticationMethodServerTrust 只是 authenticationMethod 的冰山一角 要理解 NSURLAuthenticationMethodServerTrust, 首先要明白它只是 authenticationMethod 的选项之一, 并非唯一 1 先厘清概念 点说明authenticationMethodURLAuthenticationChallenge.protectionS…...

DBAPI如何优雅的获取单条数据

API如何优雅的获取单条数据 案例一 对于查询类API&#xff0c;查询的是单条数据&#xff0c;比如根据主键ID查询用户信息&#xff0c;sql如下&#xff1a; select id, name, age from user where id #{id}API默认返回的数据格式是多条的&#xff0c;如下&#xff1a; {&qu…...

【HTML-16】深入理解HTML中的块元素与行内元素

HTML元素根据其显示特性可以分为两大类&#xff1a;块元素(Block-level Elements)和行内元素(Inline Elements)。理解这两者的区别对于构建良好的网页布局至关重要。本文将全面解析这两种元素的特性、区别以及实际应用场景。 1. 块元素(Block-level Elements) 1.1 基本特性 …...

12.找到字符串中所有字母异位词

&#x1f9e0; 题目解析 题目描述&#xff1a; 给定两个字符串 s 和 p&#xff0c;找出 s 中所有 p 的字母异位词的起始索引。 返回的答案以数组形式表示。 字母异位词定义&#xff1a; 若两个字符串包含的字符种类和出现次数完全相同&#xff0c;顺序无所谓&#xff0c;则互为…...

OPENCV形态学基础之二腐蚀

一.腐蚀的原理 (图1) 数学表达式&#xff1a;dst(x,y) erode(src(x,y)) min(x,y)src(xx,yy) 腐蚀也是图像形态学的基本功能之一&#xff0c;腐蚀跟膨胀属于反向操作&#xff0c;膨胀是把图像图像变大&#xff0c;而腐蚀就是把图像变小。腐蚀后的图像变小变暗淡。 腐蚀…...