当前位置：首页 > article >正文

YOLOv8源码修改（4）- 实现YOLOv8模型剪枝（任意YOLO模型的简单剪枝）

article 2026/5/10 13:42:16

前言

1. 需修改的源码文件

1.1添加C2f_v2模块

1.2 修改模型读取方式

1.3 增加 L1 正则约束化训练

1.4 在tensorboard上增加BN层权重和偏置参数分布的可视化

1.5 增加剪枝处理文件

2. 工程目录结构

3. 源码文件修改

3.1 添加C2f_v2模块和模型读取

3.2 添加L1正则约束训练和现实约束的BN层参数分布

3.3 添加剪枝方法文件yolov8_prune.py

4. 模型训练与测试结果对比

4.1 模型结构比较

4.2 模型稀疏性对比

4.3 模型性能分析

4.4 进一步改进训练

5. 总结

前言

博主，博主，YOLO剪枝相关的实现，虽然很多，但还是太吃操作了，有没有更加简单、易上手的方法？有的兄弟，有的。

本文基于YOLOv8/YOLOv11官方源码修改，以最小改动、不需要了解剪枝算法为前提，可以实现对任意YOLO模型的剪枝、训练和微调。

本文使用的YOLO框架源码：版本为8.3.67的 ultralytics。

本文使用的剪枝框架源码：版本为1.5.1的 Torch-Pruning。

1. 需修改的源码文件

1.1添加C2f_v2模块

ultralytics/nn/modules/block.py

ultralytics/nn/modules/__init__.py

ultralytics/nn/tasks.py

1.2 修改模型读取方式

ultralytics/engine/model.py

ultralytics/cfg/default.yaml

1.3 增加 L1 正则约束化训练

ultralytics/engine/trainer.py

ultralytics/cfg/default.yaml

1.4 在tensorboard上增加BN层权重和偏置参数分布的可视化

ultralytics/engine/trainer.py

ultralytics/utils/callbacks/tensorboard.py

1.5 增加剪枝处理文件

yolov8_prune.py

2. 工程目录结构

YOLO的框架目录结构和训练启动可以参考：YOLOv8-pose（1）- 关键点检测数据集格式详解+快速训练+预测结果详解

本文修改和训练和上述文章大同小异，工程目录结构中，涉及修改的文件如下（淡蓝色）：

3. 源码文件修改

代码添加在哪，看截图中的行数。

3.1 添加C2f_v2模块和模型读取

YOLOv8模型剪枝问题主要在C2f模块，剪枝时会存在BUG：Hey Guys! I think I found the bug. When a concat module & a split module are directly connected, the index mapping system fails to compute correct idxs. I'm going to rewrite the concat & split tracing. Really thanks for this issue!

C2f_v2模块源码取自：YOLOv8 Pruning。torch-pruning官方给出的 yolov8_pruning.py 是针对 ultralytics-8.0.x版本的，在8.3.67上运行会报错（数据设备加载不一致、某些类没有初始化、自定义模型加载失败等）。这导致训练起来麻烦，所以本文就直接在8.3.67的源码上修改了，用官方的训练代码。

# C2f_v2模块，用于替换C2f模块
class C2f_v2(nn.Module):# CSP Bottleneck with 2 convolutionsdef __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansionsuper().__init__()self.c = int(c2 * e)  # hidden channelsself.cv0 = Conv(c1, self.c, 1, 1)self.cv1 = Conv(c1, self.c, 1, 1)self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))def forward(self, x):# y = list(self.cv1(x).chunk(2, 1))y = [self.cv0(x), self.cv1(x)]y.extend(m(y[-1]) for m in self.m)return self.cv2(torch.cat(y, 1))

在ultralytics/nn/modules/block.py两处:

在ultralytics/nn/modules/__init__.py添加两处：

在ultralytics/nn/tasks.py三处:

模型加载修改：

在 model.train() 模式下，源码会读取 model.model.yaml 中保存的结构配置文件来生成模型结构，并把加载的模型参数赋值给这个结构，所以如果不修改，每次加载模型（即使是剪枝模型），都会加载原始的YOLO的模型。

而 torch.load() 和 torch.save() 本身就可以读取和保存模型结构和参数，model.load() 和 model.save() 就是对这两个方法的封装，已经可以读取和保存剪枝模型的结构和参数了。

在ultralytics/cfg/default.yaml中，添加参数 custom_model:

用于控制是使用自己定义的读取方式（True）还是官方的读取方式（False）。

网上没找到能在8.3.67版本中有效读取的方法，自己暂时像下面这样实现了一下。思路：

self.trainer 利用回调函数实现智能加载，实现各种参数的初始化，模型用 model.model.yaml 中的参数进行初始化，而在前面已经用 model.load(model) 加载过原始结构（剪枝的）的模型了，所以这里就用原始结构的模型替换新生成的。

同时YOLOv8中 DFL 层是用来处理三个不同分辨率检测头输出的，所以不需要进行梯度回传。这部分后续在 RKNN 上部署也会涉及。

增加的代码：

# ultralytics/engine/model.py 中实现自定义结构模型加载  if not args.get("custom_model", None):self.model = self.trainer.modelelse:print(f"\033[1;32mINFO\033[0m: custom_model is True, load custom model. ")for name, param in self.model.named_parameters():if name == "model.22.dfl.conv.weight":param.requires_grad = False  # 冻结else:param.requires_grad = True  # 解冻其他层self.trainer.model.model = self.model.model

3.2 添加L1正则约束训练和现实约束的BN层参数分布

还是在ultralytics/cfg/default.yaml中，有参数：

add_L1就是控制是（True）否（False）引入L1-norm，L1_start是起始的的L1权重，最终的是L1_start * L1_end（L1_end需要小于1，否则梯度会出错），这里参考了学习率的实现（lr0和lrf）。

add_L1: False           # (bool) use L1 norm
L1_start: 0             # (float) L1 regularization weight
L1_end: 0.1             # (float) final L1 regularization weight (L1_start * L1_end), default: 0.1

L1会使权重稀疏（服从拉普拉斯分布），L2使权重在0周围平滑（服从高斯分布），因为L1是直接对变量 x 求导（x非0时，恒为1），要使梯度减小，只能让x为0。L2是对变量的平方x^2求导（为2x），要让梯度变小，只需要x减小。

加入L2正则在优化器中实现，直接修改default.yaml中的weight_decay参数即可：

weight_decay: 0.0005 # (float) optimizer weight decay 5e-4

在ultralytics/engine/trainer.py中添加L1正则相关参数：

开启L1正则时，关闭混合精度训练（看网上都关了，我自己没关貌似也没影响），注意self.amp的判断逻辑，add_L1为True，其就为False。

在梯度回传后，优化之前，增加BN层权重和偏置的L1正则：

第一个特征提取层不添加，因为第一层作为最基本的特征提取层，后续也不进行剪枝。

上述两段代码：

        # ============================== add parameter: L1 norm  =======================================================self.add_L1 = self.args.add_L1 if self.args.add_L1 is not None else Falseself.L1_start = self.args.L1_start if self.args.L1_start is not None else 0.0001self.L1_end = self.args.L1_end if self.args.L1_end is not None else 0.1# ============================== add parameter: L1 norm  =======================================================# Check AMP# self.amp = torch.tensor(self.args.amp).to(self.device)  # True or Falseself.amp = torch.tensor(self.args.amp and not self.add_L1).to(self.device)  # True or False# ============================== add parameter: L1 regularization constraint  ==========================if self.add_L1:l1_coefficient = self.L1_start * (1 - (1 - self.L1_end) * epoch / self.epochs)for modules_name, modules_layer in self.model.named_modules():if isinstance(modules_layer, nn.BatchNorm2d):# 第一个特征提取层，不进行约束if "model.0." in modules_name:continuemodules_layer.weight.grad.data.add_(l1_coefficient * torch.sign(modules_layer.weight.data))modules_layer.bias.grad.data.add_(l1_coefficient * torch.sign(modules_layer.bias.data))# ============================== add parameter: L1 regularization constraint  ==========================

在ultralytics/engine/trainer.py中添加用tensorboard查看BN层参数分布的回调函数：

在每轮训练结束前，利用回调函数 on_show_bn_weights_and_bias，计算模型中BN层的权重分布。

代码如下：

            # ============================== show bn weights and bias distribution in tensorboard ======================if self.add_L1:module_list_weight = []module_list_bias = []for modules_name, modules_layer in self.model.named_modules():if isinstance(modules_layer, nn.BatchNorm2d):module_list_weight.append(modules_layer.state_dict()['weight'])     # 获取BN层的gamma参数（权重）module_list_bias.append(modules_layer.state_dict()['bias'])         # 获取 BN 层的 beta（偏置）# 获取所有 BN 层的通道数size_list = [idx.data.shape[0] for idx in module_list_weight]# 预分配 Tensor 存放 BN 权重和偏置self.bn_weights = torch.zeros(sum(size_list))self.bn_biases = torch.zeros(sum(size_list))index = 0for idx, size in enumerate(size_list):self.bn_weights[index:(index + size)] = module_list_weight[idx].data.abs().clone()  # 取绝对值self.bn_biases[index:(index + size)] = module_list_bias[idx].data.abs().clone()index += size# 合并 BN 权重和偏置self.bn_params = torch.cat([self.bn_weights, self.bn_biases], dim=0)self.run_callbacks("on_show_bn_weights_and_bias")    # 触发回调# ============================== show bn weights and bias distribution in tensorboard ======================

在ultralytics/utils/callbacks/tensorboard.py添加回调函数：

这里需要注意numpy和protobuf的版本问题，过高会导致调用add_histogram方法失败。

代码如下：

# ============================== show bn weights and bias distribution in tensorboard ==================================
def on_show_bn_weights_and_bias(trainer):if WRITER:"""numpy>=1.18.5,<1.24.0protobuf<4.21.3"""show_error = Truetry:WRITER.add_histogram('BN/weights', trainer.bn_weights.numpy(), trainer.epoch, bins='doane')WRITER.add_histogram('BN/biases', trainer.bn_biases.numpy(), trainer.epoch, bins='doane')bn_params = torch.cat([trainer.bn_weights, trainer.bn_biases]).numpy()  # 记录合并参数 weight 和 biasWRITER.add_histogram('BN/params', bn_params, trainer.epoch, bins='doane')except Exception as e:if show_error:print(f"\033[1;31mERROR\033[0m: plot weight and bias distribution failed. Cause by: {e}")show_error = False
# ============================== show bn weights and bias distribution in tensorboard ==================================callbacks = ({"on_pretrain_routine_start": on_pretrain_routine_start,"on_train_start": on_train_start,"on_fit_epoch_end": on_fit_epoch_end,"on_train_epoch_end": on_train_epoch_end,# ============================== show bn weights and bias distribution in tensorboard =========================="on_show_bn_weights_and_bias": on_show_bn_weights_and_bias,# ============================== show bn weights and bias distribution in tensorboard ==========================}if SummaryWriterelse {}
)

3.3 添加剪枝方法文件yolov8_prune.py

该文件相比torch-pruning官方的 yolov8_pruning.py 的改动较大，或者说，仅使用了其中的剪枝方法对稀疏化训练好的模型进行了剪枝。代码如下：

# yolov8_prune.py
# This code is adapted from Issue [#147](https://github.com/VainF/Torch-Pruning/issues/147),
# implemented by @Hyunseok-Kim0.
import math
import torch
import torch.nn as nn
import torch_pruning as tp
from copy import deepcopy
from ultralytics import YOLO
from ultralytics.nn.modules import Detect, C2f, C2f_v2
from ultralytics.utils import yaml_load
from ultralytics.utils.checks import check_yaml
from ultralytics.utils.torch_utils import initialize_weightsdef infer_shortcut(bottleneck):# 判定 bottleneck 有没有残差连接c1 = bottleneck.cv1.conv.in_channelsc2 = bottleneck.cv2.conv.out_channels# a) c1 == c2        表示输入通道数与输出通道数相同# b) hasattr(bottleneck, 'add')  检查 bottleneck 对象是否有名为 'add' 的属性# c) bottleneck.add  该属性为 True 时表示确实存在捷径连接return c1 == c2 and hasattr(bottleneck, 'add') and bottleneck.adddef transfer_weights(c2f, c2f_v2):# 1. 直接把 c2f 的 cv2 和 m (ModuleList) 复制给 c2f_v2c2f_v2.cv2 = c2f.cv2c2f_v2.m = c2f.m# 2. 获取旧模型和新模型的 state_dict（权重字典）state_dict = c2f.state_dict()   # 包含 c2f 的所有参数(key)->tensor 的映射state_dict_v2 = c2f_v2.state_dict()     # 包含 c2f_v2 的所有参数(key)->tensor 的映射# 3. 处理 cv1 中的卷积权重，并拆分到 c2f_v2 的 cv0、cv1# Transfer cv1 weights from C2f to cv0 and cv1 in C2f_v2old_weight = state_dict['cv1.conv.weight']  # 取出旧模型 c2f 的 'cv1.conv.weight'half_channels = old_weight.shape[0] // 2state_dict_v2['cv0.conv.weight'] = old_weight[:half_channels]   # 把前一半的卷积核(通道维度)放到新模型的 'cv0.conv.weight'state_dict_v2['cv1.conv.weight'] = old_weight[half_channels:]   # 把后一半的卷积核(通道维度)放到新模型的 'cv1.conv.weight'# 4. 同理，把 cv1 的 BatchNorm 参数(权重、偏置、均值、方差)也拆分到 cv0.bn 和 cv1.bn# Transfer cv1 batchnorm weights and buffers from C2f to cv0 and cv1 in C2f_v2for bn_key in ['weight', 'bias', 'running_mean', 'running_var']:old_bn = state_dict[f'cv1.bn.{bn_key}']state_dict_v2[f'cv0.bn.{bn_key}'] = old_bn[:half_channels]  # 前 half_channels 个用于 cv0.bnstate_dict_v2[f'cv1.bn.{bn_key}'] = old_bn[half_channels:]  # 后 half_channels 个用于 cv1.bn# 5. 把剩余的权重直接复制过去(所有不是以 'cv1.' 开头的 key)，因为 'cv1.' 是我们刚才专门拆分处理的，其它的保持原样即可# Transfer remaining weights and buffersfor key in state_dict:if not key.startswith('cv1.'):state_dict_v2[key] = state_dict[key]# 6. 复制所有非方法(non-method)的属性给新的 c2f_v2，例如一些标量、列表、或者其他需要保留的成员# Transfer all non-method attributesfor attr_name in dir(c2f):attr_value = getattr(c2f, attr_name)if not callable(attr_value) and '_' not in attr_name:setattr(c2f_v2, attr_name, attr_value)# 7. 用修改完的 state_dict_v2 为 c2f_v2 加载权重，实际完成全部参数赋值c2f_v2.load_state_dict(state_dict_v2)def replace_c2f_with_c2f_v2(module):# 1. 遍历当前 module 的所有子模块for name, child_module in module.named_children():# 2. 如果发现子模块是 C2f 类型，就要把它替换成 C2f_v2if isinstance(child_module, C2f):# 2.1 通过第一个 Bottleneck 推断它是否存在 shortcut# Replace C2f with C2f_v2 while preserving its parametersshortcut = infer_shortcut(child_module.m[0])# 2.2 根据旧 C2f 的输入输出通道数、Bottleneck 数量等，创建一个新的 C2f_v2c2f_v2 = C2f_v2(child_module.cv1.conv.in_channels,child_module.cv2.conv.out_channels,n=len(child_module.m),shortcut=shortcut,g=child_module.m[0].cv2.conv.groups,e=child_module.c / child_module.cv2.conv.out_channels)# 2.3 调用 transfer_weights，把旧的 C2f 参数(权重、BN等)拷贝/转换到新的 c2f_v2transfer_weights(child_module, c2f_v2)# 2.4 用 setattr 把模块本身替换成新创建的 c2f_v2setattr(module, name, c2f_v2)else:# 3. 如果这个子模块不是 C2f，就递归地继续往下找replace_c2f_with_c2f_v2(child_module)def replace_silu_with_relu(module):"""Recursively replace all SiLU activation functions in the given modulewith ReLU activation functions.Args:module (nn.Module): The module to process."""# 1. 遍历当前 module 的所有子模块for name, child_module in module.named_children():# 2. 如果发现子模块是 SiLU 类型，就把它替换成 ReLUif isinstance(child_module, nn.SiLU):# 用 ReLU 替换 SiLUsetattr(module, name, nn.ReLU(inplace=True))else:# 3. 如果子模块不是 SiLU，递归地继续处理子模块replace_silu_with_relu(child_module)def replace_module_in_dict(model_dict, original_module, new_module):"""Replace a specified module type in a model dictionary (e.g., 'C2f' -> 'C2f_v2').Args:model_dict (dict): Model configuration dictionary containing 'backbone' and 'head' keys.original_module (str): The name of the module to replace (e.g., 'C2f').new_module (str): The new module name to replace with (e.g., 'C2f_v2').Returns:dict: Updated model dictionary with the specified module replaced."""def replace_modules(layers, original, new):for layer in layers:if layer[2] == original:layer[2] = newreturn layers# Replace in backbone and headif 'backbone' in model_dict:model_dict['backbone'] = replace_modules(model_dict['backbone'], original_module, new_module)if 'head' in model_dict:model_dict['head'] = replace_modules(model_dict['head'], original_module, new_module)return model_dictdef update_model_yaml(model_dict):width_multiple = model_dict['width_multiple']for section in ['backbone', 'head']:for layer in model_dict[section]:# 检查每层的参数列表是否包含32倍数的值args = layer[-1]for i in range(len(args)):if isinstance(args[i], int) and args[i] % 32 == 0:# 乘以 width_multiple 并四舍五入为最接近的 32 倍数args[i] = round(args[i] * width_multiple / 32) * 32# 将 width_multiple 更新为 1.0model_dict['width_multiple'] = 1.0return model_dictdef update_pruned_model_yaml(original_yaml, pruned_model):"""根据剪枝后的模型，更新原始模型的 model.model.yaml 字典。Args:original_yaml (dict): 原始模型的 model.model.yaml 字典。pruned_model (nn.Module): 剪枝后的模型。Returns:dict: 更新后的 model.model.yaml 字典。"""# 遍历剪枝后的模型层pruned_layers = []for idx, layer in enumerate(pruned_model.model):pruned_layers.append(layer)# 遍历原始模型的 yaml，更新通道数updated_yaml = original_yaml.copy()backbone = updated_yaml["backbone"]head = updated_yaml["head"]# 更新 backbone 的通道数for i, layer_info in enumerate(backbone):module_type = layer_info[2]  # 模块类型 (如 'Conv', 'C2f_v2', 等)if module_type == "Conv":  # 更新 Conv 的通道数pruned_out_channels = pruned_layers[i].conv.out_channelsbackbone[i][3][0] = pruned_out_channelselif module_type.startswith("C2f"):  # 更新 C2f_v2 的通道数pruned_out_channels = pruned_layers[i].cv2.conv.out_channelsbackbone[i][3][0] = pruned_out_channels# 更新 head 的通道数for i, layer_info in enumerate(head):module_type = layer_info[2]if module_type == "Conv":  # 更新 Conv 的通道数pruned_out_channels = pruned_layers[len(backbone) + i].conv.out_channelshead[i][3][0] = pruned_out_channelselif module_type.startswith("C2f"):  # 更新 C2f_v2 的通道数pruned_out_channels = pruned_layers[len(backbone) + i].cv2.conv.out_channelshead[i][3][0] = pruned_out_channels# 返回更新后的 yamlupdated_yaml["backbone"] = backboneupdated_yaml["head"] = headreturn updated_yamldef prune(prune_param: dict):# 加载yolo模型（以剪枝的也可以），并绑定自定义训练方法train_v2model = YOLO(prune_param["model"])model_name_part = os.path.basename(prune_param["model"]).rpartition('.')model_name = model_name_part[0] + f"_x{str(prune_param['target_prune_rate'])}." + model_name_part[-1]# model.train_v2 = train_v2.__get__(model, type(model))# 加载剪枝参数pruning_cfg = yaml_load(check_yaml(prune_param["cfg_path"]))# 加载模型，并把其中的C2f模块替换成C2f_v2model.to(pruning_cfg["device"])replace_c2f_with_c2f_v2(model)# 将SiLU替换成ReLU# replace_silu_with_relu(model)model.model.yaml = replace_module_in_dict(model.model.yaml, "C2f", "C2f_v2")# model.model.yaml = replace_module_in_dict(model.model.yaml, "SiLU", "ReLU")   # yaml里没有，所以不需要# 将yaml中C2f替换成C2f_v2model.model.yaml = update_model_yaml(model.model.yaml)initialize_weights(model.model)  # set BN.eps, momentum, ReLU.inplacefor name, param in model.model.named_parameters():param.requires_grad = True# 初始化Torch-Pruning剪枝的相关参数example_inputs = torch.randn(1, 3, pruning_cfg["imgsz"], pruning_cfg["imgsz"]).to(pruning_cfg["device"])model.model.cuda(pruning_cfg["device"])base_macs, base_nparams = tp.utils.count_ops_and_params(model.model, example_inputs)# do validation before pruning modelpruning_cfg['name'] = f"baseline_val"pruning_cfg['batch'] = 4validation_model = deepcopy(model)metric = validation_model.val(**pruning_cfg)init_map = metric.box.mapprint(f"Before Pruning: MACs={base_macs / 1e9: .5f} G, #Params={base_nparams / 1e6: .5f} M, mAP={init_map: .5f}")# prune same ratio of filter based on initial size# pruning_ratio = 1 - math.pow((1 - prune_param["target_prune_rate"]), 1 / prune_param["iterative_steps"])model.model.train()for name, param in model.model.named_parameters():param.requires_grad = Trueignored_layers = []# unwrapped_parameters = []# 忽略含 “model.0.conv”，不剪枝最浅层卷积for name, module in model.named_modules():if "model.0.conv" in name or "dfl" in name:ignored_layers.append(module)# 不剪枝检测头for m in model.model.modules():if isinstance(m, (Detect,)):ignored_layers.append(m)example_inputs = example_inputs.to(model.device)pruner = tp.pruner.GroupNormPruner(model.model,example_inputs,importance=tp.importance.GroupNormImportance(),  # L2 norm pruning,iterative_steps=prune_param["iterative_steps"],pruning_ratio=prune_param["target_prune_rate"],ignored_layers=ignored_layers,round_to=32,global_pruning=prune_param["global_pruning"]# unwrapped_parameters=unwrapped_parameters)pruner.step()# pre fine-tuning validationpruning_cfg['name'] = f"step_pre_val"pruning_cfg['batch'] = 1validation_model.model = deepcopy(model.model)metric = validation_model.val(**pruning_cfg)pruned_map = metric.box.mappruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs.to(model.device))current_speed_up = base_macs / pruned_macsprint(f"After pruning: MACs={pruned_macs / 1e9} G, #Params={pruned_nparams / 1e6} M, "f"mAP={pruned_map}, speed up={current_speed_up}")model.model.yaml = update_pruned_model_yaml(model.model.yaml, model.model)model.save(os.path.join(pruning_cfg['name'], model_name))returnif __name__ == "__main__":import osos.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'prune_param = {"model": r'last.pt',# "model": r'G:\code\yolo_v8_v11\torch_pruning/best.pt',"cfg_path": 'setting.yaml',     # Pruning config file. Having same format with ultralytics/cfg/default.yaml"iterative_steps": 3,           # Total pruning iteration step"target_prune_rate": 0.8,       # Target pruning rate"global_pruning": True,}prune(prune_param)

简单说一下这个文件实现的功能：

明面上要修改的参数：指定模型路径，模型的配置文件，剪枝迭代次数（torch-pruning内部的），剪枝比例，进行全局剪枝（从全部层来考虑某些层的重要性）。

    prune_param = {"model": r'last.pt',# "model": r'G:\code\yolo_v8_v11\torch_pruning/best.pt',"cfg_path": 'setting.yaml',     # Pruning config file. Having same format with ultralytics/cfg/default.yaml"iterative_steps": 3,           # Total pruning iteration step"target_prune_rate": 0.8,       # Target pruning rate"global_pruning": True,}

在一个任务目录下（如kx_s_250126）下，复制训练用的数据配置文件data.yaml和模型配置文件setting.yaml，以及S1_L1（约束化训练的结果）中的last.pt，然后运行yolov8_prune.py即可。

data.yaml如下：

配置正负样本平衡的参数negative_setting参考文章：

YOLOv8源码修改（1）- DataLoader增加负样本数据读取+平衡训练batch中的正负样本数

不修改的话就删了或者注释掉。

# path: /path/to/datasets
train: ["G:/datasets/bank_scene/00-task/yolo_det_kx_jh_1028/yolo_det_kx_jh_1028/train"]
val: ["G:/datasets/bank_scene/00-task/yolo_det_kx_jh_1028/yolo_det_kx_jh_1028/val"]# Add negative image ---------------------------------------------------------------------------------------------------
negative_setting:neg_ratio: 1    # 小于等于0时，按原始官方配置训练，大于0时，控制正负样本。use_extra_neg: Trueextra_neg_sources: {# "G:/datasets/bank_monitor/0-贵重物品遗留/贵重物品遗留三类图片/2024-07-12-qianbao": 100}fix_dataset_length: 0    # 是否自定义每轮参与训练的图片数量# number of classes ----------------------------------------------------------------------------------------------------
nc: 2# Classes --------------------------------------------------------------------------------------------------------------
names:0: kx1: kx_dk

setting.yaml如下：

部分超参数需要自己调整，比如轮次、批次、学习率、参数大小等。

# Train settings -------------------------------------------------------------------------------------------------------
task: detect            # (str) YOLO task, i.e. detect, segment, classify, pose
mode: train             # (str) YOLO mode, i.e. train, val, predict, export, track, benchmark
data: ./data.yaml       # (str, optional) path to data file, i.e. coco8.yaml
epochs: 20              # (int) number of epochs to train for
batch: 4                # (int) number of images per batch (-1 for AutoBatch)
imgsz: 640              # (int | list) input images size as int for train and val modes, or list[w,h] for predict and export modes
patience: 1000          # (int) epochs to wait for no observable improvement for early stopping of training
device: 0               # (int | str | list, optional) device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 0              # (int) number of worker threads for data loading (per RANK if DDP)
project: ./             # (str, optional) project name
name: S3_ft         # (str, optional) experiment name, results saved to 'project/name' directory
optimizer: SGD          # (str) optimizer to use, choices=[SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto]
single_cls: False       # (bool) train multi-class data as single-class
rect: False             # (bool) rectangular training if mode='train' or rectangular validation if mode='val'
cos_lr: False           # (bool) use cosine learning rate scheduler
close_mosaic: 0         # (int) disable mosaic augmentation for final epochs (0 to disable)
resume: False           # (bool) resume training from last checkpoint
amp: True               # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check
freeze: None            # (int | list, optional) freeze first n layers, or freeze list of layer indices during training
multi_scale: True       # (bool) Whether to use multiscale during training# Hyperparameters ------------------------------------------------------------------------------------------------------
lr0: 0.0005               # (float) initial learning rate (i.e. SGD=1E-2, Adam=1E-3)
lrf: 0.1               # (float) final learning rate (lr0 * lrf)
momentum: 0.937         # (float) SGD momentum/Adam beta1
weight_decay: 0.0001    # (float) optimizer weight decay 5e-4
warmup_epochs: 0.0      # (float) warmup epochs (fractions ok)
warmup_momentum: 0.8    # (float) warmup initial momentum
warmup_bias_lr: 0.1     # (float) warmup initial bias lr
box: 7.5                # (float) box loss gain, default: 7.5
cls: 0.5                # (float) cls loss gain (scale with pixels), default: 0.5
dfl: 1.5                # (float) dfl loss gain, default: 1.5
degrees: 0.0            # (float) image rotation (+/- deg)
translate: 0.1          # (float) image translation (+/- fraction)
scale: 0.5              # (float) image scale (+/- gain)
fliplr: 0.5             # (float) image flip left-right (probability)
mosaic: 0.95            # (float) image mosaic (probability)# Custom configuration, sparse training, pruning and distillation ------------------------------------------------------
custom_model: True      # (bool) load train model from .pt directory, not from model.model.yaml
add_L1: False            # (bool) use L1 norm
L1_start: 0.001         # (float) L1 regularization weight
L1_end: 0.1             # (float) final L1 regularization weight (L1_start * L1_end), default: 0.1

关于yolov8_prune.py中的部分方法：

replace_c2f_with_c2f_v2: 将原始模型中的C2f模块替换为C2f_v2。

replace_silu_with_relu: 将原始模型中的SiLU替换成ReLU。（默认没有使用）

replace_module_in_dict: 替换model.model.yaml中模块名。

update_model_yaml：根据剪枝模型每个模块的通道大小，修改原始模型model.model.yaml，并复赋值给剪枝后模型的model.model.yaml（改配置信息并不准确，只能大概计算，因为模块内部的输入、输出也会改变）

核心代码-替换模块：

# 加载模型，并把其中的C2f模块替换成C2f_v2model.to(pruning_cfg["device"])replace_c2f_with_c2f_v2(model)# 将SiLU替换成ReLU# replace_silu_with_relu(model)model.model.yaml = replace_module_in_dict(model.model.yaml, "C2f", "C2f_v2")# model.model.yaml = replace_module_in_dict(model.model.yaml, "SiLU", "ReLU")   # yaml里没有，所以不需要# 将yaml中C2f替换成C2f_v2model.model.yaml = update_model_yaml(model.model.yaml)initialize_weights(model.model)  # set BN.eps, momentum, ReLU.inplace

核心代码-模型剪枝：

详细使用参考 Torch-Pruning （应用不同剪枝算法，自定义通道剪枝，自定义比例等）。简单使用如下即可。

    example_inputs = example_inputs.to(model.device)pruner = tp.pruner.GroupNormPruner(model.model,example_inputs,importance=tp.importance.GroupNormImportance(),  # L2 norm pruning,iterative_steps=prune_param["iterative_steps"],pruning_ratio=prune_param["target_prune_rate"],ignored_layers=ignored_layers,round_to=32,global_pruning=prune_param["global_pruning"]# unwrapped_parameters=unwrapped_parameters)

忽略的层，剪枝时候忽略了第一个卷积层和检测头层已经dfl层：

    ignored_layers = []# unwrapped_parameters = []# 忽略含 “model.0.conv”，不剪枝最浅层卷积for name, module in model.named_modules():if "model.0.conv" in name or "dfl" in name:ignored_layers.append(module)# 不剪枝检测头for m in model.model.modules():if isinstance(m, (Detect,)):ignored_layers.append(m)

最终剪枝的加载效果如下（yolov8-s.pt）：

绿色框表明替换了模块，蓝色框表明减少了参数（不准确，没计算模块内部的剪枝），红色框表明了每个模块输入/输出通道数的减少（同样不准确）。为了方便部署和更好利用gpn/npu等，将剪枝通道的输入/输出均设置为32的倍数（剪枝时，round_to=32）。

上述模型，实际的测评（GFLOPs为半精度的）：

4. 模型训练与测试结果对比

4.1 模型结构比较

ONNX格式查看C2f和C2f_v2（去除了split操作）:

PT格式查看剪枝模型的通道数变化:

4.2 模型稀疏性对比

加入L1约束的训练（颜色越深轮次越大），可以看到0值有增加，但不多（L1_start=0.001, L1_end=0.1）。

量化对比非稀疏训练（左）的BN的0值明显小于稀疏训练的（右）。

回调后：

4.3 模型性能分析

如下表所示，剪枝后模型参数量减少了49.1%，FLOPs减少20.4%，推理速度提升10.6%（RTX3060 LAPTOP），在预测准确度上，几乎不变（相差均在1%以内）。

剪枝后的模型精度还几乎不变，可能是数据集太简单了，总共2个类别，用了2000+张训练，292张验证。后续换大数据集调参再测试。

模型名称	Img.	Inst.	P	R	mAP50	mAP75	mAP50-95	params (M)	time (ms)	Gflops
origin	292	476	0.987	0.992	0.995	0.989	0.943	11.13	4.7	28.4
origin_L1	292	476	0.988	0.996	0.994	0.992	0.943	11.13	4.7	28.4
origin_L1_p	292	476	0.982	0.996	0.994	0.989	0.929	5.66	4.2	22.6
origin_L1_p_ft	292	476	0.995	0.988	0.994	0.993	0.940	5.66	4.2	22.6

4.4 进一步改进训练

分析BN层的权重和偏置：模型参数还是太大了，且离群值众多，这样量化的时候，精度掉点会比较多。后续需要用更严格的限制，来调优模型参数范围。

将通道数设置为32的倍数，导致能剪枝的通道很少（很多模块，通道数本身就是64，128），导致要么减四分之一，要么减一半，试试16。

训练加入知识蒸馏，检测头实际都没有被剪枝，所以相同尺寸的输入下，不同大小的模型输出是一样的，这样就可以加入知识蒸馏，来提高一下训练精度。

5. 总结

上述操作，实际将剪枝和训练分开了，只要一个YOLO模型能被剪枝，就可以使用ultralytics-8.3.67的框架来训练，使用torch-pruning，大部分模型不需要再关心内部实现，直接剪枝就行，简化流程，提高开发效率。

前言

1. 需修改的源码文件

1.1添加C2f_v2模块

1.2 修改模型读取方式

1.3 增加 L1 正则约束化训练

1.4 在tensorboard上增加BN层权重和偏置参数分布的可视化

1.5 增加剪枝处理文件

2. 工程目录结构

3. 源码文件修改

3.1 添加C2f_v2模块和模型读取

3.2 添加L1正则约束训练和现实约束的BN层参数分布

3.3 添加剪枝方法文件yolov8_prune.py

4. 模型训练与测试结果对比

4.1 模型结构比较

4.2 模型稀疏性对比

4.3 模型性能分析

4.4 进一步改进训练

5. 总结

相关文章：