当前位置: 首页 > news >正文

昇腾910使用记录

一. 压缩文件和解压文件

1. 压缩文件

tar -czvf UNITE-main.tar.gz ./UNITE-main/

2. 解压文件

tar -xvf ./UNITE-main/

二. CUDA更改为NPU

data['label'] = data['label'].cuda()
data['instance'] = data['instance'].cuda()
data['image'] = data['image'].cuda()

更改为

data['label'] = data['label'].npu()
data['instance'] = data['instance'].npu()
data['image'] = data['image'].npu()

三. 配置环境变量

1. 创建env.sh

touch env.sh

2. 打开env.sh

vi env.sh

3. 配置环境变量

# 配置CANN相关环境变量
CANN_INSTALL_PATH_CONF='/etc/Ascend/ascend_cann_install.info'
if [ -f $CANN_INSTALL_PATH_CONF ]; thenDEFAULT_CANN_INSTALL_PATH=$(cat $CANN_INSTALL_PATH_CONF | grep Install_Path | cut -d "=" -f 2)
elseDEFAULT_CANN_INSTALL_PATH="/usr/local/Ascend/"
fi
CANN_INSTALL_PATH=${1:-${DEFAULT_CANN_INSTALL_PATH}}
if [ -d ${CANN_INSTALL_PATH}/ascend-toolkit/latest ];thensource ${CANN_INSTALL_PATH}/ascend-toolkit/set_env.sh
elsesource ${CANN_INSTALL_PATH}/nnae/set_env.sh
fi
# 导入依赖库
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/openblas/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib64/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib/aarch64_64-linux-gnu
# 配置自定义环境变量
export HCCL_WHITELIST_DISABLE=1
# log
export ASCEND_SLOG_PRINT_TO_STDOUT=0 # 日志打屏, 可选
export ASCEND_GLOBAL_LOG_LEVEL=3 # 日志级别常用 1 INFO级别; 3 ERROR级别
export ASCEND_GLOBAL_EVENT_ENABLE=0 # 默认不使能event日志信息

并输入

:wq!

4. 使用环境

source env.sh

四. RuntimeError: ACL stream synchronize failed, error code:507018

E39999: Inner Error, Please contact support engineer!
E39999  Aicpu kernel execute failed, device_id=0, stream_id=0, task_id=6394, fault op_name=ScatterElements[FUNC:GetError][FILE:stream.cc][LINE:1044]TraceBack (most recent call last):rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]DEVICE[0] PID[41411]: 
EXCEPTION TASK:Exception info:TGID=2593324, model id=65535, stream id=0, stream phase=SCHEDULE, task id=742, task type=aicpu kernel, recently received task id=742, recently send task id=741, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-11:22:01.273.951, function=proc_aicpu_task_done, line=972, error code=0x2a 
EXCEPTION TASK:Exception info:TGID=2593324, model id=65535, stream id=0, stream phase=3, task id=6394, task type=aicpu kernel, recently received task id=6406, recently send task id=6393, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-11:41:20.661.958, function=proc_aicpu_task_done, line=972, error code=0x2a
Traceback (most recent call last):File "train.py", line 40, in <module>trainer.run_generator_one_step(data_i)File "/home/ma-user/work/SPADE-master/trainers/pix2pix_trainer.py", line 35, in run_generator_one_stepg_losses, generated = self.pix2pix_model(data, mode='generator')File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_implreturn forward_call(*input, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forwardreturn self.module(*inputs, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_implreturn forward_call(*input, **kwargs)File "/home/ma-user/work/SPADE-master/models/pix2pix_model.py", line 43, in forwardinput_semantics, real_image = self.preprocess_input(data)File "/home/ma-user/work/SPADE-master/models/pix2pix_model.py", line 113, in preprocess_inputdata['label'] = data['label'].npu()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch_npu/utils/device_guard.py", line 38, in wrapperreturn func(*args, **kwargs)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/site-packages/torch_npu/utils/tensor_methods.py", line 66, in _npureturn torch_npu._C.npu(self, *args, **kwargs)
RuntimeError: ACL stream synchronize failed, error code:507018
THPModule_npu_shutdown success.

猜测可能是没有开混合精度

五. 开启混合精度

1. 在构建神经网络前,我们需要导入torch_npu中的AMP模块

import time
import torch
import torch.nn as nn
import torch_npu
from torch_npu.npu import amp    # 导入AMP模块

2. 在模型、优化器定义之后,定义AMP功能中的GradScaler

model = CNN().to(device)
train_dataloader = DataLoader(train_data, batch_size=batch_size)    # 定义DataLoader
loss_func = nn.CrossEntropyLoss().to(device)    # 定义损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)    # 定义优化器
scaler = amp.GradScaler()    # 在模型、优化器定义之后,定义GradScaler

3. 在训练代码中添加AMP功能相关的代码开启AMP

for epo in range(epochs):
for imgs, labels in train_dataloader:
imgs = imgs.to(device)labels = labels.to(device)with amp.autocast():outputs = model(imgs)    # 前向计算loss = loss_func(outputs, labels)    # 损失函数计算optimizer.zero_grad()# 进行反向传播前后的loss缩放、参数更新scaler.scale(loss).backward()    # loss缩放并反向传播scaler.step(optimizer)    # 更新参数(自动unscaling)scaler.update()    # 基于动态Loss Scale更新loss_scaling系数 

六. 未知错误

E39999: Inner Error, Please contact support engineer!
E39999  An exception occurred during AICPU execution, stream_id:78, task_id:742, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:673]TraceBack (most recent call last):Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task.cc][LINE:1068]Aicpu kernel execute failed, device_id=0, stream_id=78, task_id=742.[FUNC:PrintAicpuErrorInfo][FILE:task.cc][LINE:774]Aicpu kernel execute failed, device_id=0, stream_id=78, task_id=742, fault op_name=ScatterElements[FUNC:GetError][FILE:stream.cc][LINE:1044]rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49]op[Minimum], The Minimum op dtype is not same, type1:DT_FLOAT16, type2:DT_FLOAT[FUNC:CheckTwoInputDtypeSame][FILE:util.cc][LINE:116]Verifying Minimum failed.[FUNC:InferShapeAndType][FILE:infershape_pass.cc][LINE:135]Call InferShapeAndType for node:Minimum(Minimum) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:117]process pass InferShapePass on node:Minimum failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:530]build graph failed, graph id:894, ret:1343242270[FUNC:BuildModel][FILE:ge_generator.cc][LINE:1484][Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]DEVICE[0] PID[189368]: 
EXCEPTION TASK:Exception info:TGID=3114744, model id=65535, stream id=78, stream phase=SCHEDULE, task id=742, task type=aicpu kernel, recently received task id=742, recently send task id=741, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-12:12:22.763.259, function=proc_aicpu_task_done, line=972, error code=0x2a 
EXCEPTION TASK:Exception info:TGID=3114744, model id=65535, stream id=78, stream phase=3, task id=4347, task type=aicpu kernel, recently received task id=4354, recently send task id=4346, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2023-10-12-12:13:57.997.757, function=proc_aicpu_task_done, line=972, error code=0x2a
Aborted (core dumped)
(py38) [ma-user SPADE-master]$Process ForkServerProcess-2:
Traceback (most recent call last):File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrapself.run()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in runself._target(*self._args, **self._kwargs)File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 61, in wrapperraise expFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 58, in wrapperfunc(*args, **kwargs)File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 268, in task_distributekey, func_name, detail = resource_proxy[TASK_QUEUE].get()File "<string>", line 2, in getFile "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethodkind, result = conn.recv()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 250, in recvbuf = self._recv_bytes()File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytesbuf = self._recv(4)File "/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 383, in _recvraise EOFError
EOFError
/home/ma-user/anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 91 leaked semaphore objects to clean up at shutdownwarnings.warn('resource_tracker: There appear to be %d '

参考链接1
参考链接2:昇腾官网

相关文章:

昇腾910使用记录

一. 压缩文件和解压文件 1. 压缩文件 tar -czvf UNITE-main.tar.gz ./UNITE-main/2. 解压文件 tar -xvf ./UNITE-main/二. CUDA更改为NPU data[label] data[label].cuda() data[instance] data[instance].cuda() data[image] data[image].cuda()更改为 data[label] da…...

从一部iPhone手机看芯片的分类

目录 问题 iPhone X 手机处理器&#xff1a;A11 iPhone X 的两大存储芯片 数字 IC CPU&#xff1a;计算设备的运算核心和控制核心 GPU&#xff1a;图形处理器 ASIC&#xff1a;为解决特定应用问题而定制设计的集成电路 存储芯片&#xff1a;DRAM 和 NAND Flash iPhone…...

arm day 7

完成字符串收发函数的封装并且验证现象&#xff0c;一个字符串发送接受后会有‘\n’ \r src/uart.c #include"uart.h"void uart4_init() {//设置UART4的RCc时钟使能//RCC_MP_APB1ENSETR[16]->1RCC->MP_APB1ENSETR | (0x1<<16);//设置GPIOB和GPIOG的时钟…...

Java基础面试-面向对象

什么是面向对象&#xff1f; 对比面向过程&#xff0c;是两种不同的处理问题角度 面向过程更注重事情的每一个步骤及顺序&#xff0c;面向对象更注重事情有哪些参与者&#xff08;对象&#xff09;&#xff0c;及各自需要做什么 比如洗衣机洗衣服 面向过程会将任务拆解成一系…...

GCC vs. G++:C 与 C++ 编译器的差异和比较

本文将介绍 GCC&#xff08;GNU Compiler Collection&#xff09;和 G 编译器的区别&#xff0c;并对它们在 C 和 C 程序开发中的特性和用法进行比较和总结。 引言 在 C 和 C 程序开发中&#xff0c;选择合适的编译器是至关重要的。GCC&#xff08;GNU Compiler Collection&a…...

MAC m系列docker login报错

错误&#xff1a;ERROR: failed to solve: XXX error getting credentials - err: exit status 1, out: 解决&#xff1a; vi ~/.docker/config.jsonzsxzsx [15时55分55秒] [~] { {"auths": {"harbor-g42c.corp.matrx.team": {"auth": "…...

Redis通用指令和五大基本数据类型常用指令总结

通用指令 keys parttern 查询key (parttern即通配符&#xff0c;不是正则表达式&#xff0c;例如 keys a? 匹配以a开头的长度为2的key) del key 删除key exists key 获取key是否存在 type key 获取key的类型 expire key seconds 为指定key设置有效期&#xff0c;单位秒 …...

uCharts常用图表组件demo

带渐变阴影的曲线图 <view class"charts-box"><qiun-data-charts type"area" :opts"opts" :chartData"chartData" :ontouch"true":background"rgba(256,256,256,0)" /> </view>data(){return{…...

VNC:Timed out waiting for a response from the computer

VNC的服务端使用的是TigerVNC&#xff0c;客户端使用的是RealVNC TigerVNC按其他博客配好后&#xff0c;防火墙ip什么的都配了&#xff0c;vnc客户端怎么连都是超时。 这里建议大家可以尝试一下重启服务器。我的是CentOS的 shutdown -r now 配了2天&#xff0c;最后服务器重启…...

Kotlin 协程 知识点

Android 上的 Kotlin 协程 | Android Developers (google.cn) 官方网址 1.什么是协程&#xff1f; 我觉得协程就是kotlin中一种优雅的实现异步请求 协程&#xff08;Coroutines&#xff09;是一种轻量级的并发编程概念&#xff0c;旨在简化异步编程和并发任务的处理。它是…...

简单大方的自我介绍 PPT 格式

自我介绍是展示自己的机会&#xff0c;同时也是展现自信和魅力的重要时刻。通过简单大方的PPT格式&#xff0c;可以更好地展示自己的个性和才华。下面是一些建议&#xff0c;帮助你在自我介绍中展现自信和魅力。 1. 打造简洁而有吸引力的PPT布局&#xff1a; - 选择简洁大方的背…...

panads操作excel

panads简介 pandas是基于Numpy创建的Python包&#xff0c;内置了大量标准函数&#xff0c;能够高效地解决数据分析数据处理和分析任务&#xff0c;pandas支持多种文件的操作&#xff0c;比如Excel&#xff0c;csv&#xff0c;json&#xff0c;txt 文件等&#xff0c;读取文件之…...

【MySQL】联合查询、子查询、合并查询

这里提供了三个表&#xff1a; 表1&#xff1a; mysql> select * from class; -------------- | id | name | -------------- | 1 | 一班 | | 2 | 二班 | | 3 | 三班 | -------------- 3 rows in set (0.01 sec) 表2&#xff1a; mysql> select * fro…...

小程序中如何设置所服务地区的时区

在全球化的背景下&#xff0c;小程序除了在中国使用外&#xff0c;还为海外的华人地区提供服务。例如我们采云小程序为泰国、阿根廷、缅甸等国家的商家就提供过微信小程序。这些商家开通小程序&#xff0c;为本地的华人提供服务。但通常小程序的开发者/服务商位于中国&#xff…...

Linux环境安装mysql8.0

1个人习惯我喜欢给软件安装在/use/local下&#xff0c;我使用的finalshell软件&#xff0c;直接手动新建一个文件夹名字为mysql 2下载mysql wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-glibc2.12-x86_64.tar.xz 3解压文件 tar -xvf mysql-8.0.2…...

STM32_DMA_多通道采集ADC出现错位现象

STM32_DMA_多通道采集ADC出现错位现象 问题描述&#xff1a; adcSensorValue[0],adcSensorValue[3],adcSensorValue[6]… //存储通道1数据 adcSensorValue[1],adcSensorValue[4],adcSensorValue[7]… //存储通道2数据 adcSensorValue[2],adcSensorValue[5],adcSensorValue[8]……...

Linux内存管理 (2):memblock 子系统的建立

前一篇&#xff1a;Linux内存管理 (1)&#xff1a;内核镜像映射临时页表的建立 文章目录 1. 前言2. 分析背景3. memblock 简介3.1 memblock 数据结构3.2 memblock 接口 4. memblock 的构建过程 1. 前言 限于作者能力水平&#xff0c;本文可能存在谬误&#xff0c;因此而给读者…...

创新学习方式,电大搜题助您迈向成功之路

近年来&#xff0c;随着信息技术的发展&#xff0c;互联网在教育领域发挥的作用越来越显著。贵州开放大学作为国内首家电视大学&#xff0c;一直致力于创新教学模式&#xff0c;帮助学生更好地获取知识。在学习过程中&#xff0c;学生常常遇到疑难问题&#xff0c;而解决这些问…...

Mybatis整理

Mybatis 定义 Mybatis是一个半ORM&#xff08;对象关系映射&#xff09;框架&#xff0c;它内部封装了JDBC&#xff0c;加载驱动、创建连接、创建statement等繁杂的过程&#xff0c;开发者开发时只需要关注如何编写SQL语句&#xff0c;可以严格控制sql执行性能&#xff0c;灵…...

pytorch定义datase多次重复采样

有的时候训练需要对样本重复抽样为一个batch&#xff0c;可以按如下格式定义: class TrainLoader(Dataset):def __init__(self, fns, repeat1):super(TrainLoader, self).__init__()self.length len(fns) # 数据数量self.repeat repeat # 数据重复次数def __getitem__(self,…...

wordpress后台更新后 前端没变化的解决方法

使用siteground主机的wordpress网站&#xff0c;会出现更新了网站内容和修改了php模板文件、js文件、css文件、图片文件后&#xff0c;网站没有变化的情况。 不熟悉siteground主机的新手&#xff0c;遇到这个问题&#xff0c;就很抓狂&#xff0c;明明是哪都没操作错误&#x…...

突破不可导策略的训练难题:零阶优化与强化学习的深度嵌合

强化学习&#xff08;Reinforcement Learning, RL&#xff09;是工业领域智能控制的重要方法。它的基本原理是将最优控制问题建模为马尔可夫决策过程&#xff0c;然后使用强化学习的Actor-Critic机制&#xff08;中文译作“知行互动”机制&#xff09;&#xff0c;逐步迭代求解…...

Oracle查询表空间大小

1 查询数据库中所有的表空间以及表空间所占空间的大小 SELECTtablespace_name,sum( bytes ) / 1024 / 1024 FROMdba_data_files GROUP BYtablespace_name; 2 Oracle查询表空间大小及每个表所占空间的大小 SELECTtablespace_name,file_id,file_name,round( bytes / ( 1024 …...

Redis相关知识总结(缓存雪崩,缓存穿透,缓存击穿,Redis实现分布式锁,如何保持数据库和缓存一致)

文章目录 1.什么是Redis&#xff1f;2.为什么要使用redis作为mysql的缓存&#xff1f;3.什么是缓存雪崩、缓存穿透、缓存击穿&#xff1f;3.1缓存雪崩3.1.1 大量缓存同时过期3.1.2 Redis宕机 3.2 缓存击穿3.3 缓存穿透3.4 总结 4. 数据库和缓存如何保持一致性5. Redis实现分布式…...

python爬虫:Newspaper3k 的详细使用(好用的新闻网站文章抓取和解析的Python库)

更多内容请见: 爬虫和逆向教程-专栏介绍和目录 文章目录 一、Newspaper3k 概述1.1 Newspaper3k 介绍1.2 主要功能1.3 典型应用场景1.4 安装二、基本用法2.2 提取单篇文章的内容2.2 处理多篇文档三、高级选项3.1 自定义配置3.2 分析文章情感四、实战案例4.1 构建新闻摘要聚合器…...

CRMEB 框架中 PHP 上传扩展开发:涵盖本地上传及阿里云 OSS、腾讯云 COS、七牛云

目前已有本地上传、阿里云OSS上传、腾讯云COS上传、七牛云上传扩展 扩展入口文件 文件目录 crmeb\services\upload\Upload.php namespace crmeb\services\upload;use crmeb\basic\BaseManager; use think\facade\Config;/*** Class Upload* package crmeb\services\upload* …...

OpenLayers 分屏对比(地图联动)

注&#xff1a;当前使用的是 ol 5.3.0 版本&#xff0c;天地图使用的key请到天地图官网申请&#xff0c;并替换为自己的key 地图分屏对比在WebGIS开发中是很常见的功能&#xff0c;和卷帘图层不一样的是&#xff0c;分屏对比是在各个地图中添加相同或者不同的图层进行对比查看。…...

C++ Visual Studio 2017厂商给的源码没有.sln文件 易兆微芯片下载工具加开机动画下载。

1.先用Visual Studio 2017打开Yichip YC31xx loader.vcxproj&#xff0c;再用Visual Studio 2022打开。再保侟就有.sln文件了。 易兆微芯片下载工具加开机动画下载 ExtraDownloadFile1Info.\logo.bin|0|0|10D2000|0 MFC应用兼容CMD 在BOOL CYichipYC31xxloaderDlg::OnIni…...

Spring Cloud Gateway 中自定义验证码接口返回 404 的排查与解决

Spring Cloud Gateway 中自定义验证码接口返回 404 的排查与解决 问题背景 在一个基于 Spring Cloud Gateway WebFlux 构建的微服务项目中&#xff0c;新增了一个本地验证码接口 /code&#xff0c;使用函数式路由&#xff08;RouterFunction&#xff09;和 Hutool 的 Circle…...

九天毕昇深度学习平台 | 如何安装库?

pip install 库名 -i https://pypi.tuna.tsinghua.edu.cn/simple --user 举个例子&#xff1a; 报错 ModuleNotFoundError: No module named torch 那么我需要安装 torch pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple --user pip install 库名&#x…...