当前位置：首页 > news >正文

【LLM系列之指令微调】长话短说大模型指令微调的“Prompt”

news 2025/7/9 5:36:54

1 指令微调数据集形式“花样”太多

大家有没有分析过 prompt对模型训练或者推理的影响？之前推理的时候，发现不加训练的时候prompt，直接输入模型性能会变差的，这个倒是可以理解。假如不加prompt直接训练，是不是测试的时候不加prompt也可以？还有一个就是多轮prompt和单轮prompt怎么构造的问题？好多模型训练方式不统一包括指令数据形式有所不同，选择困难症又来了。。

在这里插入图片描述

先说一些观点，假如我们在微调一个大模型，单次实验微调所用的指令微调数据集应该选取“质量高、多样性”,在训练资源充足的情况可以加入数量更多，长度更大的数据集。可以基于多个质量比较高的数据，做一份格式统一的多样性数据用来做sft，一次性微调完比较好，多次微调效果可能会折扣。或者有继续微调比较合适的方案也可以，不损失之前模型的效果（或者损失比较小），目前可以尝试Lora或者Qlora的方式微调底座模型，然后将训练好的Lora权重合并到原始模型，这样可以减轻多次微调对模型的影响。

2 常见指令微调模板

通过观测一些排行榜靠前和主流指令微调数据集，笔者总结一些常见的指令微调的Prompt：

常见的是stanford_alpaca中模板

PROMPT_DICT = {"prompt_input": ("Below is an instruction that describes a task, paired with an input that provides further context. ""Write a response that appropriately completes the request.\n\n""### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"),"prompt_no_input": ("Below is an instruction that describes a task. ""Write a response that appropriately completes the request.\n\n""### Instruction:\n{instruction}\n\n### Response:"),
}

Llama2中的模板

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

Linly-AI中模板

### Instruction:{prompt.strip()}  ### Response:

OpenLLM 排行榜top1的NousResearch

和alpaca模板差不多

### Instruction:
<prompt>### Response:
<leave a newline blank for model to respond>

### Instruction:
<prompt>### Input:
<additional context>### Response:
<leave a newline blank for model to respond>

Yayi模板

https://huggingface.co/wenge-research/yayi-7b-llama2

prompt = "你是谁？"
formatted_prompt = f"""<|System|>:
You are a helpful, respectful and honest assistant named YaYi developed by Beijing Wenge Technology Co.,Ltd. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<|Human|>:
{prompt}<|YaYi|>:
"""

StableBeluga2的模板

### System:
This is a system prompt, please behave and help the user.### User:
Your prompt here### Assistant:
The output of Stable Beluga 2

比如

system_prompt = "### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can. Remember, be safe, and don't do anything illegal.\n\n"message = "Write me a poem please"
prompt = f"{system_prompt}### User: {message}\n\n### Assistant:\n"

Guanaco数据集常用模板

### Human: {prompt}
### Assistant:

prompt = "Introduce yourself"
formatted_prompt = (f"A chat between a curious human and an artificial intelligence assistant."f"The assistant gives helpful, detailed, and polite answers to the user's questions.\n"f"### Human: {prompt} ### Assistant:"
)

3 多轮对话输入和输出构造

参考yangjianxin1/Firefly项目和LinkSoul-AI/Chinese-Llama-2-7b项目，一般采用的方式是：

在计算loss时，我们通过mask的方式，input部分的loss不参与参数更新，只有“target”部分的loss参与参数更新。这种方式充分利用了模型并行计算的优势，训练更加高效，且多轮对话中的每个target部分都参与了训练，训练更充分。否则，就需要把一个n轮对话，拆分成n条数据，且只计算最后一个target的loss，大大降低了训练效率。

具体实现方式1：

# https://github.com/LinkSoul-AI/Chinese-Llama-2-7b/blob/main/train.py
def tokenize(item, tokenizer):roles = {"human": "user", "gpt": "assistant"}input_ids = []labels = []if "instruction" in item and len(item["instruction"]) > 0:system = item["instruction"]else:system = dummy_message["system"]system = B_SYS + system + E_SYS# add system before the first content in conversationsitem["conversations"][0]['value'] = system + item["conversations"][0]['value']for i, turn in enumerate(item["conversations"]):role = turn['from']content = turn['value']content = content.strip()if role == 'human':content = f"{B_INST} {content} {E_INST} "content_ids = tokenizer.encode(content)labels += [IGNORE_TOKEN_ID] * (len(content_ids))else:# assert role == "gpt"content = f"{content} "content_ids = tokenizer.encode(content, add_special_tokens=False) + [tokenizer.eos_token_id]   # add_special_tokens=False remove bos token, and add eos at the endlabels += content_idsinput_ids += content_idsinput_ids = input_ids[:tokenizer.model_max_length]labels = labels[:tokenizer.model_max_length]trunc_id = last_index(labels, IGNORE_TOKEN_ID) + 1input_ids = input_ids[:trunc_id]labels = labels[:trunc_id]if len(labels) == 0:return tokenize(dummy_message, tokenizer)input_ids = safe_ids(input_ids, tokenizer.vocab_size, tokenizer.pad_token_id)labels = safe_ids(labels, tokenizer.vocab_size, IGNORE_TOKEN_ID)return input_ids, labels