当前位置：首页 > article >正文

2023-ICLR-ReAct 首次结合Thought和Action提升大模型解决问题的能力

article 2026/2/8 7:21:47

关于普林斯顿大学和Google Research, Brain Team合作的一篇文章, 在语言模型中协同Reasoning推理和Action行动。

论文地址：https://arxiv.org/abs/2210.03629
代码：https://github.com/ysymyth/ReAct.git
其他复现 langchain ：https://python.langchain.com/api_reference/langchain/agents/langchain.agents.agent.AgentExecutor.html#

作者们注意到，尽管LLMs在理解和生成方面表现出色，但它们在推理和行动方面的能力通常是分开研究的。他们提出，通过交互式的方式生成推理痕迹（reason）和任务特定行动（act），可以更有效地结合这两种能力，从而提高模型的 interpretability、trustworthiness 和解决复杂任务的能力。

数据:

HotpotQA: 多跳问题回答基准，需要模型跨越多个Wikipedia页面进行推理。
FEVER: 事实验证基准，模型必须基于Wikipedia页面验证声明的真实性。
方法：
Standard（标准提示）：删除ReAct轨迹中的所有思想、行动、观察等步骤。
CoT（思想链提示）：删除行动和观察，保留思想，并作为仅用于推理的基线。
CoT-SC（self-consistency）：利用自一致性[1]方法，在推理期间抽样21个CoT轨迹，解码温度为0.7，并采用大多数投票得到答案。
Act：仅仅保留Agent提示（Act），它删除了ReAct轨迹中的Thought思维过程，可以初步认为其类似于WebGPT。
ReAct：本文的Thought + Action结合的方法。
ReAct → CoT-SC：当ReAct未能在给定步骤内返回答案时，返回CoT-SC结果。
CoT-SC → ReAct：当n个CoT-SC样本中的大多数答案少于n/2次（即内部知识可能无法自信地支持任务）时，返回ReAct结果。
微调 (Finetuning)
- 使用3000个由ReAct生成的正确答案轨迹来微调较小的语言模型。‘’

Langchain 中实现的Prompt

PREFIX = """Answer the following questions as best you can. You have access to the following tools:""" FORMAT_INSTRUCTIONS = """Use the following format: 
shell
Question: the input question you must answer 
Thought: you should always think about what to do 
Action: the action to take, should be one of [{tool_names}] 
Action Input: the input to the action 
Observation: the result of the action 
... (this Thought/Action/Action Input/Observation can repeat N times) 
Thought: I now know the final answer 
Final Answer: the final answer to the original input question""" 
SUFFIX = """Begin!Question: {input} 
Thought:{agent_scratchpad}"""

示例

from langchain.agents import initialize_agent
from langchain.llms import OpenAI
from langchain.tools import BaseTool# 搜索工具
class SearchTool(BaseTool):name = "Search"description = "如果我想知道天气，'鸡你太美'这两个问题时，请使用它"return_direct = True  # 直接返回结果def _run(self, query: str) -> str:print("\nSearchTool query: " + query)return "这个是一个通用的返回"async def _arun(self, query: str) -> str:raise NotImplementedError("暂时不支持异步")# 计算工具
class CalculatorTool(BaseTool):name = "Calculator"description = "如果是关于数学计算的问题，请使用它"def _run(self, query: str) -> str:print("\nCalculatorTool query: " + query)return "100"async def _arun(self, query: str) -> str:raise NotImplementedError("暂时不支持异步")llm = OpenAI(temperature=0.5)
tools = [SearchTool(), CalculatorTool()]
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)print("问题：")
print("答案：" + agent.run("查询这周天气"))
print("问题：")
print("答案：" + agent.run("告诉我'鸡你太美'是什么意思"))
print("问题：")
print("答案：" + agent.run("告诉我'hello world'是什么意思"))
print("问题：")
print("答案：" + agent.run("告诉我10的3次方是多少?"))

2023-ICLR-ReAct 首次结合Thought和Action提升大模型解决问题的能力

Langchain 中实现的Prompt

示例

相关文章：

2023-ICLR-ReAct 首次结合Thought和Action提升大模型解决问题的能力

Rust 开发的一些GUI库

【第四十六周】文献阅读：从 RAG 到记忆：大型语言模型的非参数持续学习

从智能提效到产品赋能的架构实践

《Python 虚拟环境完全指南：如何管理项目依赖，避免版本冲突》

微信小程序带数组参数跳转页面，微信小程序跳转页面带数组参数

服务器开机自启动服务

关于OT IIOT系统远程访问的零信任安全

【Doris基础】Apache Doris vs 传统数据仓库：架构与性能的全面对比

【VScode】python初学者的有力工具

Linux系统中为Qt项目封装一个udp客户端类

443端口：HTTPS通信的安全基石

宝塔安装WordPress程序

Agent 的7 中设计模式

OpenGAN：基于开放数据生成的开放集识别

【node】Express创建服务器

使用 OpenCV 实现哈哈镜效果

DeepSeek-R1-0528 模型最新发布：编程推理能力跃升

git仓库服务gogs详解

PaddleNLP 的文本分类项目

git 一台电脑一个git账户,对应多个仓库ssh

node-DeepResearch开源ai程序用于深入调查查询，继续搜索、阅读网页、推理，直到找到答案

Asp.Net Core 托管服务

Dockerfile 编写经验：优化大小与效率

JMeter 是什么

压测服务器和线上环境的区别

C#、C++、Java、Python 选择哪个好

OpenGL Chan视频学习-8 How I Deal with Shaders in OpenGL

机器学习课程设计报告 —— 基于口红数据集的情感分析

Windows安装Docker部署dify，接入阿里云api-key进行rag测试