当前位置：首页 > news >正文

transformers-Generation with LLMs

news 2025/12/15 8:02:15

https://huggingface.co/docs/transformers/main/en/llm_tutorialhttps://huggingface.co/docs/transformers/main/en/llm_tutorial停止条件是由模型决定的，模型应该能够学习何时输出一个序列结束（EOS）标记。如果不是这种情况，则在达到某个预定义的最大长度时停止生成。

from transformers import AutoModelForCausalLMmodel = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
)

from transformers import AutoTokenizertokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to("cuda")

generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'A list of colors: red, blue, green, yellow, orange, purple, pink,'

tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default
model_inputs = tokenizer(["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
).to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
['A list of colors: red, blue, green, yellow, orange, purple, pink,',
'Portugal is a country in southwestern Europe, on the Iber']

生成策略有很多，

生成结果太短或太长

如果在GenerationConfig文件中未指定，则默认情况下generate返回最多20个标记。建议在generate调用中手动设置max_new_tokens来控制它可以返回的最大新标记数。请注意，LLM（更精确地说是仅解码器模型）还将输入提示作为输出的一部分返回。

model_inputs = tokenizer(["A sequence of numbers: 1, 2"], return_tensors="pt").to("cuda")# By default, the output will contain up to 20 tokens
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'A sequence of numbers: 1, 2, 3, 4, 5'# Setting `max_new_tokens` allows you to control the maximum length
generated_ids = model.generate(**model_inputs, max_new_tokens=50)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'A sequence of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,'

生成模式不正确

默认情况下，generate在每次迭代中选择最可能的标记（greedy decoding），除非在GenerationConfig文件中指定。

# Set seed or reproducibility -- you don't need this unless you want full reproducibility
from transformers import set_seed
set_seed(42)model_inputs = tokenizer(["I am a cat."], return_tensors="pt").to("cuda")# LLM + greedy decoding = repetitive, boring output
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'I am a cat. I am a cat. I am a cat. I am a cat'# With sampling, the output becomes more creative!
generated_ids = model.generate(**model_inputs, do_sample=True)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'I am a cat.  Specifically, I am an indoor-only cat.  I'

边缘填充错误

LLM是仅解码器架构，这意味着它们会继续对输入提示进行迭代。如果您的输入长度不相同，那么它们需要被填充。由于LLM没有被训练以从填充标记继续生成，因此输入需要进行左填充。确保还记得将注意力掩码传递给generate函数！

# The tokenizer initialized above has right-padding active by default: the 1st sequence,
# which is shorter, has padding on the right side. Generation fails to capture the logic.
model_inputs = tokenizer(["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
).to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'1, 2, 33333333333'# With left-padding, it works as expected!
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default
model_inputs = tokenizer(["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
).to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'1, 2, 3, 4, 5, 6,'

错误的prompt

一些模型和任务需要特定的输入提示格式才能正常工作。如果未使用该格式，性能可能会出现悄然下降：模型可以运行，但效果不如按照预期的提示进行操作。

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-alpha", device_map="auto", load_in_4bit=True
)
set_seed(0)
prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""
model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
input_length = model_inputs.input_ids.shape[1]
generated_ids = model.generate(**model_inputs, max_new_tokens=20)
print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
"I'm not a thug, but i can tell you that a human cannot eat"
# Oh no, it did not follow our instruction to reply as a thug! Let's see what happens when we write
# a better prompt and use the right template for this model (through `tokenizer.apply_chat_template`)set_seed(0)
messages = [{"role": "system","content": "You are a friendly chatbot who always responds in the style of a thug",},{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
input_length = model_inputs.shape[1]
generated_ids = model.generate(model_inputs, do_sample=True, max_new_tokens=20)
print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
'None, you thug. How bout you try to focus on more useful questions?'
# As we can see, it followed a proper thug style 😎

transformers-Generation with LLMs

相关文章：

transformers-Generation with LLMs

maven之父子工程版本控制案例实战，及拓展groupId和artifactId的含义

100量子比特启动实用化算力标准！玻色量子重磅发布相干光量子计算机

JAVA基础（JAVA SE）学习笔记（十）多线程

ChatGPT参数只有200亿？扩散代码模型，意外泄露

VR虚拟仿真教学在建筑学课堂中的应用

竞赛深度学习实现行人重识别 - python opencv yolo Reid

当代都市的时尚先锋：气膜建筑的魅力

品牌加盟商做信息展示预约小程序的效果如何

delphi 11.3 FastReport 多设备跨平台打印之解决方法

配置vue 环境

Visio文件编辑查看工具Visio Viewer for Mac

现在软文发布平台都有哪些？如何在正规媒体发稿？

【卷积神经网络】YOLO 算法原理

云计算与ai人工智能对高防cdn的发展

Web3时代：探索DAO的未来之路

odbcinst文件

(CQUPT 的某数据结构homework)

Android页面周期、页面跳转

腾讯云轻量应用镜像、系统镜像、Docker基础镜像、自定义镜像和共享镜像介绍

基础测试工具使用经验

高等数学（下）题型笔记（八）空间解析几何与向量代数

Spring Cloud Gateway 中自定义验证码接口返回 404 的排查与解决

STM32HAL库USART源代码解析及应用

逻辑回归暴力训练预测金融欺诈

边缘计算网关提升水产养殖尾水处理的远程运维效率

Qwen系列之Qwen3解读：最强开源模型的细节拆解

python打卡day49@浙大疏锦行

Ray框架：分布式AI训练与调参实践

运行vue项目报错 errors and 0 warnings potentially fixable with the `--fix` option.