当前位置: 首页 > news >正文

BERT ner 微调参数的选择

在这里插入图片描述
针对批大小学习率的组合进行收敛速度测试,结论:

  • 相同轮数的条件下,batchsize-32 相比 batchsize-256 的迭代步数越多,收敛更快
  • 批越大的话,学习率可以相对设得大一点

画图代码(deepseek生成):

import matplotlib.pyplot as pltdic = {(256, 1e-5): [0,        0.185357, 0.549124, 0.649283, 0.720528, 0.743900],(256, 2e-5): [0.086368, 0.604535, 0.731870, 0.763409, 0.773608, 0.781042],(256, 3e-5): [0.415224, 0.715375, 0.753391, 0.771326, 0.784421, 0.783432],(32,  1e-5): [0.710058, 0.769245, 0.781832, 0.786909, 0.792920, 0.799076],(32,  2e-5): [0.761296, 0.766089, 0.795317, 0.801602, 0.795861, 0.799864],(32,  3e-5): [0.771385, 0.788055, 0.791863, 0.793491, 0.800057, 0.799527],
}# 提取参数和对应的训练轨迹
params = list(dic.keys())
trajectories = list(dic.values())# 绘制折线图
plt.figure(figsize=(10, 6))
for param, trajectory in zip(params, trajectories):plt.plot(range(1, len(trajectory) + 1), trajectory, label=f'{param[0]}, {param[1]}')# 设置图表标题和坐标轴标签
plt.title('Validation Score Trajectory for Different Parameters')
plt.xlabel('Training Epochs')
plt.ylabel('Performance Metric')# 添加图例
plt.legend()# 显示图表
plt.show()

附录

微调命令

!python ner_finetune.py \
--gpu_device 0 \
--train_batch_size 32 \
--valid_batch_size 32 \
--epochs 6 \
--learning_rate 3e-5 \
--train_file data/cluener2020/train.json \
--valid_file data/cluener2020/dev.json \
--allow_label "{'name': 'PER', 'organization': 'ORG', 'address': 'LOC', 'company': 'ORG', 'government': 'ORG'}" \
--pretrained_model models/bert-base-chinese \
--tokenizer models/bert-base-chinese \
--save_model_dir models/local/bert_tune_5

日志

Namespace(allow_label={'name': 'PER', 'organization': 'ORG', 'address': 'LOC', 'company': 'ORG', 'government': 'ORG'}, epochs=6, gpu_device='0', learning_rate=3e-05, max_grad_norm=10, max_len=128, pretrained_model='models/bert-base-chinese', save_model_dir='models/local/bert_tune_5', tokenizer='models/bert-base-chinese', train_batch_size=32, train_file='data/cluener2020/train.json', valid_batch_size=32, valid_file='data/cluener2020/dev.json')
CUDA is available!
Number of CUDA devices: 1
Device name: NVIDIA GeForce RTX 2080 Ti
Device capability: (7, 5)
标签映射: {'O': 0, 'B-PER': 1, 'B-ORG': 2, 'B-LOC': 3, 'I-PER': 4, 'I-ORG': 5, 'I-LOC': 6}
加载数据集:data/cluener2020/train.json0%|                                                 | 0/10748 [00:00<?, ?it/s]2024-05-21 14:05:00.121060: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-21 14:05:00.172448: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-21 14:05:00.914503: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
100%|███████████████████████████████████| 10748/10748 [00:06<00:00, 1667.09it/s]
100%|█████████████████████████████████████| 1343/1343 [00:00<00:00, 2244.82it/s]
TRAIN Dataset: 7824
VALID Dataset: 971
加载模型:models/bert-base-chinese
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:- Avoid using `tokenizers` before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:- Avoid using `tokenizers` before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:- Avoid using `tokenizers` before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at models/bert-base-chinese were not used when initializing BertForTokenClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at models/bert-base-chinese and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Training epoch: 1
Training loss per 100 training steps: 2.108242988586426
Training loss per 100 training steps: 0.16535191606767108
Training loss per 100 training steps: 0.10506394136678521
Training loss epoch: 0.09411744458638892
Training accuracy epoch: 0.9225966380147197
Validation loss per 100 evaluation steps: 0.05695410072803497
Validation Loss: 0.03870751528489974
Validation Accuracy: 0.9578078217665675precision    recall  f1-score  support
LOC            0.544872  0.683646  0.606421    373.0
ORG            0.750225  0.841734  0.793349    992.0
PER            0.806452  0.913978  0.856855    465.0
micro avg      0.718691  0.827869  0.769426   1830.0
macro avg      0.700516  0.813119  0.752208   1830.0
weighted avg   0.722656  0.827869  0.771385   1830.0
Training epoch: 2
Training loss per 100 training steps: 0.030774801969528198
Training loss per 100 training steps: 0.03080757723033133
Training loss per 100 training steps: 0.03123850032538917
Training loss epoch: 0.03104725396450685
Training accuracy epoch: 0.965836879311368
Validation loss per 100 evaluation steps: 0.07264477759599686
Validation Loss: 0.03662088588480988
Validation Accuracy: 0.961701479064846precision    recall  f1-score  support
LOC            0.606635  0.686327  0.644025    373.0
ORG            0.776735  0.834677  0.804665    992.0
PER            0.821497  0.920430  0.868154    465.0
micro avg      0.752613  0.826230  0.787705   1830.0
macro avg      0.734956  0.813812  0.772281   1830.0
weighted avg   0.753439  0.826230  0.788055   1830.0
Training epoch: 3
Training loss per 100 training steps: 0.01707942970097065
Training loss per 100 training steps: 0.020070969108676555
Training loss per 100 training steps: 0.0214405001942717
Training loss epoch: 0.021760025719294744
Training accuracy epoch: 0.9760199331084162
Validation loss per 100 evaluation steps: 0.04943108558654785
Validation Loss: 0.03711987908689245
Validation Accuracy: 0.9608263101353024precision    recall  f1-score  support
LOC            0.596847  0.710456  0.648715    373.0
ORG            0.776328  0.839718  0.806780    992.0
PER            0.855967  0.894624  0.874869    465.0
micro avg      0.755866  0.827322  0.789982   1830.0
macro avg      0.743047  0.814932  0.776788   1830.0
weighted avg   0.759981  0.827322  0.791863   1830.0
Training epoch: 4
Training loss per 100 training steps: 0.014015918597579002
Training loss per 100 training steps: 0.015494177154827826
Training loss per 100 training steps: 0.015997812416015278
Training loss epoch: 0.016311514128607756
Training accuracy epoch: 0.9820175765149567
Validation loss per 100 evaluation steps: 0.04825771600008011
Validation Loss: 0.04313824124514095
Validation Accuracy: 0.9585233633276977precision    recall  f1-score  support
LOC            0.618037  0.624665  0.621333    373.0
ORG            0.794118  0.843750  0.818182    992.0
PER            0.853955  0.905376  0.878914    465.0
micro avg      0.774948  0.814754  0.794353   1830.0
macro avg      0.755370  0.791264  0.772810   1830.0
weighted avg   0.773433  0.814754  0.793491   1830.0
Training epoch: 5
Training loss per 100 training steps: 0.008429908193647861
Training loss per 100 training steps: 0.012711652241057098
Training loss per 100 training steps: 0.012486798004177747
Training loss epoch: 0.012644028145705862
Training accuracy epoch: 0.9862629694070859
Validation loss per 100 evaluation steps: 0.06491336971521378
Validation Loss: 0.049802260893967845
Validation Accuracy: 0.9582402189526026precision    recall  f1-score  support
LOC            0.608899  0.697051  0.650000    373.0
ORG            0.795749  0.867944  0.830280    992.0
PER            0.831643  0.881720  0.855950    465.0
micro avg      0.764735  0.836612  0.799061   1830.0
macro avg      0.745430  0.815572  0.778743   1830.0
weighted avg   0.766785  0.836612  0.800057   1830.0
Training epoch: 6
Training loss per 100 training steps: 0.009717799723148346
Training loss per 100 training steps: 0.008476002312422093
Training loss per 100 training steps: 0.008608183584903456
Training loss epoch: 0.008819052852614194
Training accuracy epoch: 0.9903819524689835
Validation loss per 100 evaluation steps: 0.023518526926636696
Validation Loss: 0.049626993015408516
Validation Accuracy: 0.9602429496287505precision    recall  f1-score  support
LOC            0.614251  0.670241  0.641026    373.0
ORG            0.806482  0.852823  0.829005    992.0
PER            0.848548  0.879570  0.863780    465.0
micro avg      0.776574  0.822404  0.798832   1830.0
macro avg      0.756427  0.800878  0.777937   1830.0
weighted avg   0.777989  0.822404  0.799527   1830.0

相关文章:

BERT ner 微调参数的选择

针对批大小和学习率的组合进行收敛速度测试&#xff0c;结论&#xff1a; 相同轮数的条件下&#xff0c;batchsize-32 相比 batchsize-256 的迭代步数越多&#xff0c;收敛更快批越大的话&#xff0c;学习率可以相对设得大一点 画图代码&#xff08;deepseek生成&#xff09;…...

【MySQL精通之路】系统变量-持久化系统变量

MySQL服务器维护用于配置其操作的系统变量。 系统变量可以具有影响整个服务器操作的全局值&#xff0c;也可以具有影响当前会话的会话值&#xff0c;或者两者兼而有之。 许多系统变量是动态的&#xff0c;可以在运行时使用SET语句进行更改&#xff0c;以影响当前服务器实例的…...

fdk-aac将aac格式转为pcm数据

int sampleRate 44100; // 采样率int sampleSizeInBits 16; // 采样位数&#xff0c;通常是16int channels 2; // 通道数&#xff0c;单声道为1&#xff0c;立体声为2FILE *m_fd NULL;FILE *m_fd2 NULL;HANDLE_AACDECODER decoder aacDecoder_Open(TT_MP4_ADTS, 1);if (!…...

【C语言深度解剖】(15):动态内存管理和柔性数组

&#x1f921;博客主页&#xff1a;醉竺 &#x1f970;本文专栏&#xff1a;《C语言深度解剖》 &#x1f63b;欢迎关注&#xff1a;感谢大家的点赞评论关注&#xff0c;祝您学有所成&#xff01; ✨✨&#x1f49c;&#x1f49b;想要学习更多C语言深度解剖点击专栏链接查看&…...

力扣每日一题 5/25

题目&#xff1a; 给你一个下标从 0 开始、长度为 n 的整数数组 nums &#xff0c;以及整数 indexDifference 和整数 valueDifference 。 你的任务是从范围 [0, n - 1] 内找出 2 个满足下述所有条件的下标 i 和 j &#xff1a; abs(i - j) > indexDifference 且abs(nums…...

(1)无线电失控保护(一)

文章目录 前言 1 何时触发失控保护 2 将会发生什么 3 接收机配置...

基于51单片机的多功能万年历温度计—可显示农历

基于51单片机的万年历温度计 &#xff08;仿真&#xff0b;程序&#xff0b;原理图&#xff0b;设计报告&#xff09; 功能介绍 具体功能&#xff1a; 本设计基于STC89C52&#xff08;与AT89S52、AT89C52通用&#xff0c;可任选&#xff09;单片机以及DS1302时钟芯片、DS18B…...

【软件设计师】下午题总结-数据流图、数据库、统一建模语言

下午题总结 1 试题一1.1 结构化语言 2 试题二弱实体增加权限增加实体间联系和联系的类型 3 试题三3.1 UML关系例子 3.2 例子&#xff08;2016上半年&#xff09;3.3 设计类分类3.3.1 接口类3.3.2 控制类3.3.3 实体类 3.4 简答题3.4.1 简要说明选择候选类的原则3.4.2 某个类必须…...

CSDN 自动评论互动脚本

声明 该脚本的目的只是为了提升博客创作效率和博主互动效率,希望大家还是要尊重各位博主的劳动成果。 数据库设计 尽量我们要新建一个数据库csdn_article,再在其中建一个数据表article -- csdn_article-- article-- 需要进行自动评论的表格信息...CREATE TABLE `article`…...

Tomcat端口配置

Tomcat是开源免费的服务器&#xff0c;其默认的端口为8080&#xff0c;本文讲述一下如何配置端口。 最后在浏览器中输入localhost:8888即可打开Tomcat界面...

SpringBoot中使用AOP实现日志记录功能

目录 一、SpringBoot框架介绍 二、什么是 AOP 三、日志记录的必要性 四、SpringBoot中如何使用AOP实现日志记录功能 一、SpringBoot框架介绍 SpringBoot是一个开源的Java开发框架&#xff0c;旨在简化基于Spring框架的应用程序的开发。它提供了一套开箱即用的工具&#xf…...

kubernetes(k8s) v1.30.1 helm 集群安装 Dashboard v7.4.0 可视化管理工具 图形化管理工具

本文 紧接上一篇&#xff1a;详细教程 Centos8.5 基于 k8s v1.30.1 部署高可用集群 kubeadm 安装 kubernetes v1.30.1 docker集群搭建 延长证书有效期-CSDN博客 1 Dashboard 从版本 7.0.0 开始&#xff0c;不再支持基于清单的安装。仅支持基于 Helm 的安装. #Helm 下载安装 …...

CS144(所有lab解析)

CS144 lab0-CSDN博客 (CS144 2024)Lab Checkpoint 1: stitching substrings into a byte stream &#xff08;详细解析&#xff09;-CSDN博客 CS144 Lab2 &#xff08;2024&#xff09;超详细解析-CSDN博客 Lab Checkpoint 3: the TCP sender-CSDN博客 CS144 Checkpoint 4: in…...

LeetCode 热题 100 介绍

"LeetCode热题100"通常是指LeetCode上被用户频繁练习和讨论的100道热门题目。这些题目往往对于面试准备和算法学习非常有帮助。 哈希 两数之和 难度&#xff1a;简单链接&#x1f517;&#xff1a; 这 字母异位词分组 难度&#xff1a;中等链接&#x1f517;&#x…...

Flutter 中的 AnimatedPhysicalModel 小部件:全面指南

Flutter 中的 AnimatedPhysicalModel 小部件&#xff1a;全面指南 Flutter 的 AnimatedPhysicalModel 是一个功能强大的小部件&#xff0c;它允许开发者创建具有物理效果的动画形状变换。这个小部件非常适合需要展示平滑过渡和动态交互的场景&#xff0c;如按钮按下效果、卡片…...

第二十届文博会沙井艺立方分会场启幕!大咖齐打卡!

2024年5月24日-27日&#xff0c;第二十届中国&#xff08;深圳&#xff09;国际文化产业博览交易会沙井艺立方分会场活动将在艺立方非遗&#xff08;文旅&#xff09;产业园盛大举办。 本届文博会艺立方分会场活动办展特色鲜明&#xff0c;亮彩纷呈&#xff0c;将以“种下梧桐树…...

【Vue】computed 和 methods 的区别

概述 在使用时&#xff0c;computed 当做属性使用&#xff0c;而 methods 则当做方法调用computed 可以具有 getter 和 setter&#xff0c;因此可以赋值&#xff0c;而 methods 不行computed 无法接收多个参数&#xff0c;而 methods 可以computed 具有缓存&#xff0c;而 met…...

HarmonyOS 鸿蒙应用开发 - 创建自定义组件

开发者定义的称为自定义组件。在进行 UI 界面开发时&#xff0c;通常不是简单的将系统组件进行组合使用&#xff0c;而是需要考虑代码可复用性、业务逻辑与UI分离&#xff0c;后续版本演进等因素。因此&#xff0c;将UI和部分业务逻辑封装成自定义组件是不可或缺的能力。 1、创…...

【Vue3】封装axios请求(cli和vite)

原文作者&#xff1a;我辈李想 版权声明&#xff1a;文章原创&#xff0c;转载时请务必加上原文超链接、作者信息和本声明。 Vue 【Vue3】env环境变量的配置和使用&#xff08;区分cli和vite&#xff09; 文章目录 Vue前言一、常见用法二、vue3cli封装接口1..env配置2..dev(开…...

Java8 Optional常用方法使用场景

前言&#xff1a; Optional 是 Java 8 的新特性&#xff0c;专治空指针异常&#xff08;NullPointerException, 简称 NPE&#xff09;问题&#xff0c;它是一个容器类&#xff0c;里面只存储一个元素&#xff08;这点不同于 Conllection&#xff09;。 为方便用户通过 Lambda 表…...

如何在看板中体现优先级变化

在看板中有效体现优先级变化的关键措施包括&#xff1a;采用颜色或标签标识优先级、设置任务排序规则、使用独立的优先级列或泳道、结合自动化规则同步优先级变化、建立定期的优先级审查流程。其中&#xff0c;设置任务排序规则尤其重要&#xff0c;因为它让看板视觉上直观地体…...

渗透实战PortSwigger靶场-XSS Lab 14:大多数标签和属性被阻止

<script>标签被拦截 我们需要把全部可用的 tag 和 event 进行暴力破解 XSS cheat sheet&#xff1a; https://portswigger.net/web-security/cross-site-scripting/cheat-sheet 通过爆破发现body可以用 再把全部 events 放进去爆破 这些 event 全部可用 <body onres…...

蓝桥杯 2024 15届国赛 A组 儿童节快乐

P10576 [蓝桥杯 2024 国 A] 儿童节快乐 题目描述 五彩斑斓的气球在蓝天下悠然飘荡&#xff0c;轻快的音乐在耳边持续回荡&#xff0c;小朋友们手牵着手一同畅快欢笑。在这样一片安乐祥和的氛围下&#xff0c;六一来了。 今天是六一儿童节&#xff0c;小蓝老师为了让大家在节…...

大语言模型如何处理长文本?常用文本分割技术详解

为什么需要文本分割? 引言:为什么需要文本分割?一、基础文本分割方法1. 按段落分割(Paragraph Splitting)2. 按句子分割(Sentence Splitting)二、高级文本分割策略3. 重叠分割(Sliding Window)4. 递归分割(Recursive Splitting)三、生产级工具推荐5. 使用LangChain的…...

使用van-uploader 的UI组件,结合vue2如何实现图片上传组件的封装

以下是基于 vant-ui&#xff08;适配 Vue2 版本 &#xff09;实现截图中照片上传预览、删除功能&#xff0c;并封装成可复用组件的完整代码&#xff0c;包含样式和逻辑实现&#xff0c;可直接在 Vue2 项目中使用&#xff1a; 1. 封装的图片上传组件 ImageUploader.vue <te…...

Mac软件卸载指南,简单易懂!

刚和Adobe分手&#xff0c;它却总在Library里给你写"回忆录"&#xff1f;卸载的Final Cut Pro像电子幽灵般阴魂不散&#xff1f;总是会有残留文件&#xff0c;别慌&#xff01;这份Mac软件卸载指南&#xff0c;将用最硬核的方式教你"数字分手术"&#xff0…...

MySQL 知识小结(一)

一、my.cnf配置详解 我们知道安装MySQL有两种方式来安装咱们的MySQL数据库&#xff0c;分别是二进制安装编译数据库或者使用三方yum来进行安装,第三方yum的安装相对于二进制压缩包的安装更快捷&#xff0c;但是文件存放起来数据比较冗余&#xff0c;用二进制能够更好管理咱们M…...

Redis:现代应用开发的高效内存数据存储利器

一、Redis的起源与发展 Redis最初由意大利程序员Salvatore Sanfilippo在2009年开发&#xff0c;其初衷是为了满足他自己的一个项目需求&#xff0c;即需要一个高性能的键值存储系统来解决传统数据库在高并发场景下的性能瓶颈。随着项目的开源&#xff0c;Redis凭借其简单易用、…...

【Linux系统】Linux环境变量:系统配置的隐形指挥官

。# Linux系列 文章目录 前言一、环境变量的概念二、常见的环境变量三、环境变量特点及其相关指令3.1 环境变量的全局性3.2、环境变量的生命周期 四、环境变量的组织方式五、C语言对环境变量的操作5.1 设置环境变量&#xff1a;setenv5.2 删除环境变量:unsetenv5.3 遍历所有环境…...

Python第七周作业

Python第七周作业 文章目录 Python第七周作业 1.使用open以只读模式打开文件data.txt&#xff0c;并逐行打印内容 2.使用pathlib模块获取当前脚本的绝对路径&#xff0c;并创建logs目录&#xff08;若不存在&#xff09; 3.递归遍历目录data&#xff0c;输出所有.csv文件的路径…...