当前位置：首页 > article >正文

【HuggingFace项目】：Open-R1 - DeepSeek-R1 大模型开源复现计划

article 2026/5/13 10:56:57

项目链接：https://github.com/huggingface/open-r1

概述

Open-R1 是由 HuggingFace 发布的一个完全开放的项目，旨在通过三个主要步骤复现 DeepSeek-R1 的完整训练流程。这个项目的目标是让更多人能够理解和使用 DeepSeek-R1 的技术方案，从而推动大模型技术的发展和应用。

项目步骤

知识蒸馏：通过从 DeepSeek-R1 中提取高质量的推理语料，复现 R1-Distill 模型。
强化学习：复现用于创建 R1-Zero 的纯强化学习（RL）流程，这需要建立数学、推理和代码方面的大规模数据集。
多阶段训练：展示如何通过多阶段训练，将基础模型提升到 RL 调优的水平。

项目结构

项目的核心代码位于 src/open_r1 目录下，包含以下几个主要脚本：

grpo.py：在给定数据集上用 GRPO 训练模型。
sft.py：简单的监督微调（SFT）训练。
evaluate.py：在 R1 基准测试上评估模型。
generate.py：使用 Distilabel 生成合成数据。

技术特点

并行训练：支持 DDP 或 DeepSpeed ZeRO-2/3 进行训练，并支持数据并行和张量并行。
模型评估：使用 vLLM 进行模型评估，确保评估过程的高效性和准确性。
硬件优化：针对大规模硬件（如 8×H100 GPU）进行了优化，确保在大规模计算资源上的高效运行。

安装与运行

环境设置：首先创建一个 Python 虚拟环境，并安装 vLLM 和其他依赖项。

conda create -n openr1 python=3.11 && conda activate openr1
pip install vllm==0.6.6.post1
pip install -e ".[dev]"

登录 Hugging Face 和 Weights and Biases：
```
huggingface-cli login
wandb login
```
安装 Git LFS：确保系统已安装 Git LFS，以便加载和推送模型/数据集到 Hugging Face Hub。
```
sudo apt-get install git-lfs
```

训练模型

SFT（监督微调）：使用 sft.py 脚本在特定数据集上进行监督微调。

accelerate launch --config_file=configs/zero3.yaml src/open_r1/sft.py \--model_name_or_path Qwen/Qwen2.5-Math-1.5B-Instruct \--dataset_name HuggingFaceH4/Bespoke-Stratos-17k \--learning_rate 2.0e-5 \--num_train_epochs 1 \--packing \--max_seq_length 4096 \--per_device_train_batch_size 4 \--per_device_eval_batch_size 4 \--gradient_accumulation_steps 4 \--gradient_checkpointing \--bf16 \--logging_steps 5 \--eval_strategy steps \--eval_steps 100 \--output_dir data/Qwen2.5-1.5B-Open-R1-Distill

GRPO：使用 grpo.py 脚本进行 GRPO 训练。

accelerate launch --config_file configs/zero3.yaml src/open_r1/grpo.py \--output_dir DeepSeek-R1-Distill-Qwen-7B-GRPO \--model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \--dataset_name AI-MO/NuminaMath-TIR \--max_prompt_length 256 \--per_device_train_batch_size 1 \--gradient_accumulation_steps 16 \--logging_steps 10 \--bf16

模型评估

使用 evaluate.py 脚本在 R1 基准测试上评估模型。支持单 GPU 和多 GPU 并行评估。

MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
MODEL_ARGS="pretrained=$MODEL,dtype=float16,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=aime24
OUTPUT_DIR=data/evals/$MODELlighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \--custom-tasks src/open_r1/evaluate.py \--use-chat-template \--system-prompt="Please reason step by step, and put your final answer within \boxed{}." \--output-dir $OUTPUT_DIR

数据生成

使用 generate.py 脚本生成合成数据。支持从蒸馏模型和 DeepSeek-R1 生成数据。

from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGenerationprompt_template = """\
You will be given a problem. Please reason step by step, and put your final answer within \boxed{}:
{{ instruction }}"""dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"with Pipeline(name="distill-qwen-7b-r1",description="A pipeline to generate data from a distilled r1 model",
) as pipeline:llm = vLLM(model=model_id,tokenizer=model_id,extra_kwargs={"tensor_parallel_size": 1,"max_model_len": 8192,},generation_kwargs={"temperature": 0.6,"max_new_tokens": 8192,},)prompt_column = "problem"text_generation = TextGeneration(llm=llm, template=prompt_template,num_generations=4,input_mappings={"instruction": prompt_column} if prompt_column is not None else {})if __name__ == "__main__":distiset = pipeline.run(dataset=dataset)distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")

总结

Open-R1 项目通过开源的方式，详细展示了如何从知识蒸馏到强化学习，再到多阶段训练，逐步复现 DeepSeek-R1 的训练流程。这不仅为研究人员提供了宝贵的技术参考，也为大模型的普及和应用奠定了坚实的基础。

概述