当前位置：首页 > news >正文

深度学习-模型转换_所需算力相关

news 2026/2/9 18:07:09

模型转换相关

tensflow转onnx

python -m tf2onnx.convert  \--graphdef /root/autodl-tmp/warren/text-detection-ctpn/data/ctpn.pb  \--output ./model.onnx  --inputs Placeholder:0 --outputs Reshape_2:0，rpn_bbox_pred/Reshape_1:0

pytorch转onnx

#!/usr/bin/env python3import torchfrom simple_net import SimpleModel# Load the pretrained model and export it as onnxmodel = SimpleModel()model.eval()checkpoint = torch.load("weight.pth", map_location="cpu")model.load_state_dict(checkpoint)# Prepare input tensorinput = torch.randn(1, 1, 28, 28, requires_grad=True)#batch size-1 input cahnne-1 image size 28*28# Export the torch model as onnxtorch.onnx.export(model,input,'model.onnx', # name of the exported onnx modelopset_version=11,export_params=True,do_constant_folding=True)

模型所需算力测算

手动测算

网络代码

class SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.conv1 = nn.Conv2d(1,10,5) #1 input channel 10 outchannel 5 kernel sizeself.conv2 = nn.Conv2d(10,20,3) #same as aboveself.fc1   = nn.Linear(20*10*10,500) #in / outself.fc2   = nn.Linear(500,10) #same as abovedef forward(self, x):input_size = x.size(0)x = self.conv1(x) #in batch*1*28*28 out batch*10*24*24(28-5+1)x = F.relu(x)     #keep shape not change  out batch*10*24*24x = F.max_pool2d(x,2,2) #in batch*10*10*24 out batch*10*12*12（24/2）x = self.conv2(x) #in batch*10*12*12 out:batch 20*10*10(12-3+1)x = F.relu(x)x = x.view(input_size,-1)  #flatten -1：caculate dimens autoly 20*10*10x = self.fc1(x)# in :batch*2000 out batch*500x = F.relu(x) #keep sahpe not changex = self.fc2(x) #in 500 out 10output = F.log_softmax(x,dim=1) #caculate possibility#print("------------------------------output is ",output)return output

计算过程：

参数量

conv1层：1（ input channel） * 10 （output channels） * 5 * 5 （kernel size） + 10 （bias） = 260 个参数

conv2层：10 （input channels ）* 20 （output channels ）* 3 * 3（ kernel size） + 20 （bias）= 1820 个参数

fc1全连接层：20 * 10 * 10 (20个通道，每个通道大小为10*10) * 500 (输出大小) + 500 bias = 1000500 个参数

fc2全连接层：500 (输入大小) * 10 (输出大小) + 10 bias = 5010 个参数

总参数量为：260 + 1820 + 1000500 + 5010 = 1010120 个参数

Macs

1）conv1 层的FLOPs计算：

conv1 层是一个卷积层，输入大小为 batch * 1 * 28 * 28（假设batch大小为B，输入通道数为1，高度为28，宽度为28），输出大小为 batch * 10 * 24 * 24（输出通道数为10，高度为24，宽度为24）。在卷积操作中，每个输出位置需要进行一个 5 * 5 的卷积操作。因此，计算FLOPs的公式为：

其中，B为batch大小为1

FLOPs_conv1 = B * 10 * 24 * 24 * 5 * 5=14400

2）conv2 层的FLOPs计算：

conv2 层也是一个卷积层，输入大小为 batch * 10 * 12 * 12，输出大小为 batch * 20 * 10 * 10。在卷积操作中，每个输出位置需要进行一个 3 * 3 的卷积操作。因此，计算FLOPs的公式为：

FLOPs_conv2 = B * 20 * 10 * 10 * 3 * 3

3）fc1 全连接层的FLOPs计算：

fc1 全连接层将二维的特征图展平为一维向量，并进行全连接操作。输入大小为 batch * (20 * 10 * 10)（即展平后的大小），输出大小为 batch * 500。在全连接操作中，每个输出位置需要进行一个乘法和一个加法操作。因此，计算FLOPs的公式为：

FLOPs_fc1 = B * (20 * 10 * 10) * 500 * 2

4)fc2 全连接层的FLOPs计算：

fc2 全连接层将输出大小从 500 减少到 10。输入大小为 batch * 500，输出大小为 batch * 10。在全连接操作中，每个输出位置需要进行一个乘法和一个加法操作。因此，计算FLOPs的公式为：

FLOPs_fc2 = B * 500 * 10 * 2

5)现在我们将这四层的FLOPs相加得到总体的FLOPs：

总体FLOPs = FLOPs_conv1 + FLOPs_conv2 + FLOPs_fc1 + FLOPs_fc2

总体FLOPs = B * 10 * 24 * 24 * 5 * 5 + B * 20 * 10 * 10 * 3 * 3 + B * (20 * 10 * 10) * 500 * 2 + B * 500 * 10 * 2

由于模型的参数量不依赖于batch大小B，所以FLOPs也不依赖于batch大小B。因此，我们可以直接将batch大小B忽略，得到最终的总体FLOPs：

总体FLOPs = 10 * 24 * 24 * 5 * 5 + 20 * 10 * 10 * 3 * 3 + (20 * 10 * 10) * 500 * 2 + 500 * 10 * 2

总体FLOPs ≈ 149760 + 182000 + 1000000 + 5010 = 1342770 个 FLOPs

6)因此，这个 "SimpleModel" 模型的总体FLOPs为 1342770 个 FLOPs，也就是 1.34 MMac（1.34百万次乘加运算）。

测试代码

'''Author: warrenDate: 2023-08-01 16:22:02LastEditors: warrenLastEditTime: 2023-08-01 16:26:45FilePath: /wzw/MNIST/cal_flops.pyDescription:Copyright (c) 2023 by ${git_name_email}, All Rights Reserved.'''#!/usr/bin/env python3import torchvision.models as modelsimport torch from simple_net import SimpleModelfrom ptflops import get_model_complexity_infoDEVICE     = torch.device("cuda" if torch.cuda.is_available() else "cpu")with torch.cuda.device(0):model     = SimpleModel().to(DEVICE)input_data = torch.randn(1, 1, 28, 28)macs, params = get_model_complexity_info(model, (1, 28, 28), as_strings=True,print_per_layer_stat=True, verbose=True)print('{:<30}  {:<8}'.format('Computational complexity: ', macs))print('{:<30}  {:<8}'.format('Number of parameters: ', params))

结果

SimpleModel(

1.01 M, 100.000% Params, 1.34 MMac, 100.000% MACs,

(conv1): Conv2d(260, 0.026% Params, 149.76 KMac, 11.199% MACs, 1, 10, kernel_size=(5, 5), stride=(1, 1))

(conv2): Conv2d(1.82 k, 0.181% Params, 182.0 KMac, 13.610% MACs, 10, 20, kernel_size=(3, 3), stride=(1, 1))

(fc1): Linear(1.0 M, 99.296% Params, 1.0 MMac, 74.817% MACs, in_features=2000, out_features=500, bias=True)

(fc2): Linear(5.01 k, 0.497% Params, 5.01 KMac, 0.375% MACs, in_features=500, out_features=10, bias=True)

)

Computational complexity: 1.34 MMac

Number of parameters: 1.01 M

参数解释

Params 参数量 Mac乘加运算总数

总参数量：1.01 M（1,010,000个参数），占100.000%。

总浮点运算量（MACs）：1.34 MMac（1,340,000次乘加运算），占100.000%。

各层的详细信息：

conv1层：

参数量：0.026%（大约260个参数）

MACs：11.199%（大约149.76 KMac，即149,760次乘加运算）

conv2层：

参数量：0.181%（大约1.82 k个参数）

MACs：13.610%（大约182.0 KMac，即182,000次乘加运算）

fc1全连接层：

参数量：99.296%（大约1.0 M个参数）

MACs：74.817%（大约1.0 MMac，即1,000,000次乘加运算）

fc2全连接层：

参数量：0.497%（大约5.01 k个参数）

MACs：0.375%（大约5.01 KMac，即5,010次乘加运算）

总体计算复杂度：1.34 MMac（1,340,000次乘加运算）。

总参数量：1.01 M（1,010,000个参数）。

深度学习-模型转换_所需算力相关

模型转换相关

tensflow转onnx

pytorch转onnx

模型所需算力测算

相关文章：

深度学习-模型转换_所需算力相关

Koordinator 助力云原生应用性能提升：小红书混部技术实践

java中如何使用elasticsearch—RestClient操作文档（CRUD）

MySQL自定义函数

技术学习|CDA level I 数据库应用（数据操作语言DML）

关键字：instanceof关键字

【LeetCode:34. 在排序数组中查找元素的第一个和最后一个位置 | 二分】

年度征文|回顾2023我的CSDN

3.无重复字符的最长子串（滑动窗口，C解答）

什么是系统设计 – 学习系统设计

基于Python的城市热门美食数据可视化分析系统

万字长文谈自动驾驶occupancy感知

KBDNO1.DLL文件缺失，软件或游戏无法启动运行，怎样快速修复

计算机网络【EPOLL 源码详解】

第82讲：MySQL Binlog日志的滚动

2024.1.3C语言补录宏函数

鸿蒙（HarmonyOS）项目方舟框架（ArkUI）之线性布局容器Column组件

快手推荐算法工程师三面回顾

Sonarqube安装（Docker）

双击shutdown.bat关闭Tomcat报错：未设置关闭端口~

模型参数、模型存储精度、参数与显存

Cilium动手实验室: 精通之旅---20.Isovalent Enterprise for Cilium: Zero Trust Visibility

mysql已经安装，但是通过rpm -q 没有找mysql相关的已安装包

人工智能（大型语言模型 LLMs）对不同学科的影响以及由此产生的新学习方式

【JavaSE】多线程基础学习笔记

day36-多路IO复用

十九、【用户管理与权限 - 篇一】后端基础：用户列表与角色模型的初步构建

用神经网络读懂你的“心情”：揭秘情绪识别系统背后的AI魔法

如何把工业通信协议转换成http websocket

算法刷题-回溯