当前位置：首页 > news >正文

实战 - 使用 AutoAWQ 进行量化

news 2025/7/5 2:28:23

文章目录

- 一、准备
- - 1、安装 autoawq
  - 2、模型准备
- 二、量化
- - - `config.json` 文件变化
- 三、加载量化后模型
- - - 量化后的输出
    - 原始输出
    - 对比
- 四、查看模型的精度
- - 1、查看模型卡
  - 2、查看 config.json 中的 `torch_dtype`
  - 3、打印模型信息
  - 4、model.dtype 未必是模型精度

一、准备

1、安装 autoawq

pip install autoawq 
pip install transformers==4.47.1

使用的较低版本的 transformers，如果执行下面代码有问题，可以检查 transformers 版本。

目前我的测试 Python 环境为 3.9

2、模型准备

这里以 mistralai/Mistral-7B-Instruct-v0.2 为例

如果下载有问题，可以前往模型界面查看是否需要申请权限：https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

后面代码会自动下载模型，你也可以提前下载模型：

huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2

如果网络受限，可以设置镜像地址到环境变量：

export HF_ENDPOINT='https://hf-mirror.com'

二、量化

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizermodel_path = 'mistralai/Mistral-7B-Instruct-v0.2'
quant_path = 'mistral-instruct-v0.2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)# 查看模型类型
model.dtype
# torch.float32 - 32-bit（FP32） # Quantize
model.quantize(tokenizer, quant_config=quant_config)# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)print(f'Model is quantized and saved at "{quant_path}"')

quant_config 也可以写成：

from transformers import AwqConfig, AutoConfig
quantization_config = AwqConfig(bits=quant_config["w_bit"],group_size=quant_config["q_group_size"],zero_point=quant_config["zero_point"],version=quant_config["version"].lower(),
).to_dict()model.model.config.quantization_config = quantization_config

`config.json` 文件变化

config.json 文件会变成下方的样子：

相比原来的文件，多出 quantization_config 内容，其中 "quant_method": "awq"

{"_name_or_path": "/home/wx/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/3ad372fc79158a2148299e3318516c786aeded6c","architectures": ["MistralForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 1,"eos_token_id": 2,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 32768,"model_type": "mistral","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"quantization_config": {"bits": 4,"group_size": 128,"modules_to_not_convert": null,"quant_method": "awq","version": "gemm","zero_point": true},"rms_norm_eps": 1e-05,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.47.1","use_cache": false,"vocab_size": 32000
}

原始 config.json

{"architectures": ["MistralForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 1,"eos_token_id": 2,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 32768,"model_type": "mistral","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"rms_norm_eps": 1e-05,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.36.0","use_cache": true,"vocab_size": 32000
}

三、加载量化后模型


from transformers import AutoModelForCausalLM, AutoTokenizer
quant_dir = '/home/wx/mistral-instruct-v0.2-awq'  
model = AutoModelForCausalLM.from_pretrained(quant_dir, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(quant_dir, trust_remote_code=True)prompt = "Tell me about blackhole."
prompt_template=f'''{prompt}'''tokens = tokenizer(prompt_template, return_tensors="pt").input_ids.cuda()generated_ids = model.generate(tokens, do_sample=True,temperature=0.7,top_p=0.95,top_k=40,max_new_tokens=512)
decoded = tokenizer.decode(generated_ids[0])
print(decoded)

量化后的输出

GPU 占用：4550MiB

<s> Tell me about blackhole.A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape. It's called a "black" hole because it appears black due to the absence of light emanating from it.Black holes are formed when a massive star collapses in on itself after it has exhausted its nuclear fuel. The collapse causes the star to shrink down to an incredibly small size, creating an incredibly dense object. This object is so dense that its gravity warps space and time around it, forming an event horizon, which is the point of no return. Once anything crosses this event horizon, it's pulled into the black hole and cannot escape.Black holes come in different sizes, with the smallest being about the size of a star and the largest being billions of times larger than the sun. The largest black hole that has been discovered is located at the center of the galaxy, and it's estimated to be about 40 billion times the mass of the sun.Despite their intimidating name, black holes are not necessarily a threat to us. The closest known black hole to Earth is about 1,600 light-years away, which is far enough that we don't need to worry about being sucked in. However, they are fascinating objects that continue to captivate scientists and the general public alike.</s>

原始输出

mistralai/Mistral-7B-Instruct-v0.2 , GPU 占用：21988MiB

<s> Tell me about blackhole. I've heard that it is some sort of astronomical thing, but I don't really understand what it is or how it works.A black hole is an extremely dense object in space that has such strong gravitational pull that nothing, not even light, can escape from it once it gets too close. Black holes are formed when a massive star collapses in on itself after it has exhausted its nuclear fuel and can no longer produce the pressure needed to counteract the force of gravity.The boundary around a black hole from which nothing can escape is called the event horizon. This is not a physical boundary that you can see, but rather a theoretical construct based on the laws of physics. Once an object crosses the event horizon, it is considered to be inside the black hole itself.Black holes are not completely black, as they do emit some form of radiation, but they appear black because they absorb all the light that falls on them. This is due to the fact that the intense gravitational pull causes the surface of the black hole to be at a temperature so hot that it emits very little visible light.Black holes can vary in size, from small ones that are only a few times the mass of the sun, to supermassive black holes that can be millions or even billions of times the mass of the sun. The supermassive black holes are thought to be at the center of most, if not all, galaxies, including our own Milky Way.Despite their fearsome reputation, black holes are not a threat to us here on Earth, as they are typically located billions of light-years away. However, they are fascinating objects of study for astronomers and physicists, who continue to learn new things about them and their role in the universe.</s>

对比

	原始	4bit 量化后
占用磁盘大小	14G	3.9G
GPU 占用	21988MiB	4550MiB （4.8倍）

四、查看模型的精度

对于一个模型，我们想知道原始的精度是多少，可以用下面几种方式：

1、查看模型卡

如：https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
右边的 Safetensors 信息
在这里插入图片描述

2、查看 config.json 中的 `torch_dtype`

"torch_dtype": "bfloat16",

3、打印模型信息

from transformers import AutoTokenizermodel_path = 'mistralai/Mistral-7B-Instruct-v0.2'# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)for name, param in model.named_parameters():print(f"{name}: {param.dtype}")break  # 只打印第一个权重的数据类型

4、model.dtype 未必是模型精度

上述模型，model.dtype 打印的结果为 torch.float32，表示模型当前是以 32-bit 浮点数（FP32）精度加载的。
config.json 中的 "torch_dtype": "bfloat16"表示模型设计时支持或推荐使用 bfloat16 精度，但实际加载时可能由于环境或代码设置未启用 bfloat16。

2025-03-08（六）

实战 - 使用 AutoAWQ 进行量化

文章目录一、准备1、安装 autoawq2、模型准备二、量化config.json 文件变化三、加载量化后模型量化后的输出原始输出对比四、查看模型的精度1、查看模型卡2、查看 config.json 中的 torch_dtype3、打印模型信息4、model.dtype 未必是模型精度一、准备 1、安装 autoawq p…...

编程日记 2025/3/9 8:55:38

C++20 格式化库：强大的字符串格式化工具

文章目录格式化语法常见用法1. 填充和对齐2. 数值格式化3. 进制格式化4. 自定义类型示例代码注意事项 C20 的格式化库是一个强大的工具，用于处理字符串的格式化操作。它提供了类似于 Python 中 str.format() 的功能，但语法和用法更符合 C 的风格。以下…...

编程日记 2025/3/9 8:54:37

【一文学会 HTML5】

目录 HTML概述基本概念HTML 发展历程HTML 基本结构网页基本标签标题标签（<h1> - <h6>）段落标签（<p>）换行标签（<br>）水平线标签（<hr>）注释&#xff0…...

编程日记 2025/3/9 8:53:36

如何在WPS中接入DeepSeek并使用OfficeAI助手(超细！成功版本)

目录第一步：下载并安装OfficeAI助手第二步：申请API Key 第三步:两种方式导入WPS 第一种:本地大模型Ollama 第二种APIKey接入第四步：探索OfficeAI的创作功能工作进展汇报 PPT大纲设计第五步：我的使用体验(体验建议) …...

编程日记 2025/3/9 8:52:32

蓝耘智算 + 通义万相 2.1：为 AIGC 装上 “智能翅膀”，翱翔创作新天空

1. 引言：AIGC 的崛起与挑战在过去几年中，人工智能生成内容（AIGC）技术突飞猛进。AIGC 涉及了文本生成、图像创作、音乐创作、视频制作等多个领域，并逐渐渗透到日常生活的方方面面。传统的内容创作方式已经被许多人类创…...

编程日记 2025/3/9 8:46:22

电脑如何在系统默认的壁纸中切换自己喜欢的

1、声明：该切换壁纸仅支持win10。当你想去切换系统默认的壁纸，但是不知道该怎么切换，别慌，小亦教你几招帮你快速切换自定义壁纸。我们平常使用的win10桌面壁纸大部分都是简单、朴素的壁纸，但如果你想要切换自己喜…...

编程日记 2025/3/9 8:41:18

【大模型安全】安全解决方案

【大模型安全】安全解决方案 1.技术层面2.数据层面数据收集阶段训练阶段模型推理阶段 1.技术层面在使用大语言模型时，通常有几种选择：一种是采用封装好的大语言模型SaaS云服务；另一种是在公有云上部署自有的大语言模型，并通过权…...

编程日记 2025/3/9 8:39:16

Windows编译环境搭建(MSYS2\MinGW\cmake)

我的音视频/流媒体开源项目(github) 一、基础环境搭建 1.1 MSYS2\MinGW 参考：1. 基于MSYS2的Mingw-w64 GCC搭建Windows下C开发环境_msys2使用mingw64编译在Widndows系统上，使用gcc工具链（g）进行C程序开发？可以的&a…...

编程日记 2025/3/9 8:37:13

云曦春季开学考复现（2025）

Crypto 划水的dp和dq 下载附件后是简单的RSA算法题，之所以说简单是因为给了公钥e 趁热打铁，昨天刚学的RSA，既然有p有q，也有e，而np*q，可以算出欧拉函数值phi（p-1）*（q-1&…...

编程日记 2025/3/9 8:35:11

股票交易所官方api接口有哪些？获取和使用需要满足什么条件

炒股自动化：申请官方API接口，散户也可以 python炒股自动化（0），申请券商API接口 python炒股自动化（1），量化交易接口区别 Python炒股自动化（2）：获取…...

编程日记 2025/3/9 8:32:08

《WebForms 实例》

《WebForms 实例》引言 WebForms 是微软推出的一种用于构建动态Web应用程序的技术。它基于ASP.NET框架，允许开发者使用服务器端控件来构建用户界面，并通过事件驱动模型来响应用户交互。本文将通过一些实例，详细介绍WebForms的使用方法&…...

编程日记 2025/3/9 8:29:01

【每日学点HarmonyOS Next知识】状态变量、公共Page、可见区域变化回调、接收参数、拖拽排序控件

1、HarmonyOS 在定时器里面改变state修饰的变量，无法更新UI吗？ 将函数function写成了封装函数的形式就可以了 Entry Component struct Index {State acSetValve: number 0;aboutToAppear(): void {setInterval(() > {this.acSetValve 200;console…...

编程日记 2025/3/9 8:28:00

Intent3D

1. 研究背景在现实世界中，人们寻找 3D 物体的行为往往基于特定意图，例如“我想要一个可以支撑我背部的东西”（即寻找枕头）。传统 3D 视觉定位（3D-VG）主要依赖人工提供的参照信息（如“沙发上的…...

编程日记 2025/3/9 8:26:59

【Python 数据结构 10.二叉树】

目录一、二叉树的基本概念 1.二叉树的定义 2.二叉树的特点 3.特殊的二叉树 Ⅰ、斜树 Ⅱ、满二叉树 Ⅲ、完全二叉树 Ⅳ、完全二叉树和满二叉树的区别 4.二叉树的性质 5.二叉树的顺序存储 Ⅰ、完全二叉树 Ⅱ、非完全二叉树 Ⅲ、稀疏二叉树 6.二叉树的链式存储 7.二叉树的遍历概念…...

编程日记 2025/3/9 8:19:52

从0开始的操作系统手搓教程27：下一步，实现我们的用户进程

目录第一步：添加用户进程虚拟空间准备冲向我们的特权级3（用户特权级） 讨论下我们创建用户线程的基本步骤更加详细的分析代码用户进程的视图说一说BSS段继续看process.c中的函数添加用户线程激活现在，我们做好了TSS…...

编程日记 2025/3/9 8:18:51

set、LinkedHashSet和TreeSet的区别、Map接口常见方法、Collections 工具类使用

DAY7.2 Java核心基础想学习Collection、list、ArrayList、Set、HashSet部分的小伙伴可以转到 7.1集合框架、Collection、list、ArrayList、Set、HashSet和LinkedHashSet、判断两个对象是否相等文章查看 set集合在set集合中，处理LinkedHashSet是有序的&#xf…...

编程日记 2025/3/9 8:17:50

Qt开发：nativeEvent事件的使用

文章目录一、概述二、nativeEvent 的定义三、Windows 平台示例三、使用nativeEvent监测设备变化一、概述 Qt 的 nativeEvent 是一个特殊的事件处理机制，允许开发者处理操作系统级别的原生事件。通常，Qt 通过 QEvent 机制来管理事件，但有时…...

编程日记 2025/3/9 8:15:48

鸿蒙Next-应用检测、安装以及企业内部商店的实现

一、企业内部应用检测和更新升级 A应用检测是否安装B应用 canOpenApp():boolean{ try { let link schB://com.example.test/open; // 替换成你目标应用的link串儿 let canOpen bundleManager.canOpenLink(link); console.log("canOpen:"canOpen…...

编程日记 2025/3/9 8:14:46

存量思维和增量思维

在网上看一篇文章，有两种典型的阅读方式。一种，是挑刺式，眼里只有缺点。比如，有人不厌其烦地告诉作者，哪段有错别字，哪段不够严谨。闲得蛋疼。有这工夫，多看会书，不香么&…...

编程日记 2025/3/9 8:12:45

golang将大接口传递给小接口以及场景

文章目录 golang将大接口传递给小接口背景什么是大接口传递给小接口使用场景 golang将大接口传递给小接口背景在 Go 语言中，接口是一种强大的工具，它允许我们定义对象的行为而不关心其具体实现。特别是在复杂的应用程序中，将一个实现了较…...

编程日记 2025/3/9 8:10:43

Flask RESTful 示例

目录 1. 环境准备2. 安装依赖3. 修改main.py4. 运行应用5. API使用示例获取所有任务获取单个任务创建新任务更新任务删除任务中文乱码问题： 下面创建一个简单的Flask RESTful API示例。首先，我们需要创建环境，安装必要的依赖，然后…...

编程新知 2025/7/4 12:33:27

Qt/C++开发监控GB28181系统/取流协议/同时支持udp/tcp被动/tcp主动

一、前言说明在2011版本的gb28181协议中，拉取视频流只要求udp方式，从2016开始要求新增支持tcp被动和tcp主动两种方式，udp理论上会丢包的，所以实际使用过程可能会出现画面花屏的情况，而tcp肯定不丢包，起码…...

编程新知 2025/7/4 0:35:22

从零实现富文本编辑器#5-编辑器选区模型的状态结构表达

先前我们总结了浏览器选区模型的交互策略，并且实现了基本的选区操作，还调研了自绘选区的实现。那么相对的，我们还需要设计编辑器的选区表达，也可以称为模型选区。编辑器中应用变更时的操作范围，就是以模型选区为基准来…...

编程新知 2025/6/27 7:16:49

【Linux】C语言执行shell指令

在C语言中执行Shell指令在C语言中，有几种方法可以执行Shell指令： 1. 使用system()函数这是最简单的方法，包含在stdlib.h头文件中： #include <stdlib.h>int main() {system("ls -l"); // 执行ls -l命令retu…...

编程新知 2025/6/21 17:11:09

【SpringBoot】100、SpringBoot中使用自定义注解+AOP实现参数自动解密

在实际项目中，用户注册、登录、修改密码等操作，都涉及到参数传输安全问题。所以我们需要在前端对账户、密码等敏感信息加密传输，在后端接收到数据后能自动解密。 1、引入依赖 <dependency><groupId>org.springframework.boot</groupId><artifactId...

编程新知 2025/6/17 4:52:56

基于Flask实现的医疗保险欺诈识别监测模型

基于Flask实现的医疗保险欺诈识别监测模型项目截图项目简介社会医疗保险是国家通过立法形式强制实施，由雇主和个人按一定比例缴纳保险费，建立社会医疗保险基金，支付雇员医疗费用的一种医疗保险制度， 它是促进社会文明和进步的…...

编程新知 2025/7/2 22:42:45

SCAU期末笔记 - 数据分析与数据挖掘题库解析

这门怎么题库答案不全啊日来简单学一下子来一、选择题（可多选） 将原始数据进行集成、变换、维度规约、数值规约是在以下哪个步骤的任务?(C) A. 频繁模式挖掘 B.分类和预测 C.数据预处理 D.数据流挖掘 A. 频繁模式挖掘：专注于发现数据中…...

编程新知 2025/7/4 14:51:04

Python实现prophet 理论及参数优化

文章目录 Prophet理论及模型参数介绍Python代码完整实现prophet 添加外部数据进行模型优化之前初步学习prophet的时候，写过一篇简单实现，后期随着对该模型的深入研究，本次记录涉及到prophet 的公式以及参数调优，从公式可以更直观…...

编程新知 2025/7/3 3:03:10

React19源码系列之事件插件系统

事件类别事件类型定义文档 Event Event 接口表示在 EventTarget 上出现的事件。 Event - Web API | MDN UIEvent UIEvent 接口表示简单的用户界面事件。 UIEvent - Web API | MDN KeyboardEvent KeyboardEvent 对象描述了用户与键盘的交互。 KeyboardEvent - Web…...

编程新知 2025/6/26 14:52:48

【Web 进阶篇】优雅的接口设计：统一响应、全局异常处理与参数校验

系列回顾： 在上一篇中，我们成功地为应用集成了数据库，并使用 Spring Data JPA 实现了基本的 CRUD API。我们的应用现在能“记忆”数据了！但是，如果你仔细审视那些 API，会发现它们还很“粗糙”：有…...

编程新知 2025/7/4 21:00:25