当前位置: 首页 > news >正文

《使用 LangChain 进行大模型应用开发》学习笔记(四)

前言

本文是 Harrison Chase (LangChain 创建者)和吴恩达(Andrew Ng)的视频课程《LangChain for LLM Application Development》(使用 LangChain 进行大模型应用开发)的学习笔记。由于原课程为全英文视频课程,国内访问较慢,同时我整理和替换了部分内容以便于国内学习。阅读本文可快速学习课程内容。

课程介绍

本课程介绍了强大且易于扩展的 LangChain 框架,LangChain 框架是一款用于开发大语言模型(LLM)应用的开源框架,其使用提示词、记忆、链、代理等简化了大语言模型应用的开发工作。由于 LangChain 仍处于快速发展期,部分 API 还不稳定,课程中的部分代码已过时,我使用了目前最新的 v0.2 版本进行讲解,所有代码均可在 v0.2 版本下执行。另外,课程使用的 OpenAI 在国内难以访问,我替换为国内的 Kimi 大模型及开源自建的 Ollama,对于学习没有影响。

参考这篇文章来获取 Kimi 的 API 令牌。
参考这篇文章来用 Ollama 部署自己的大模型。

课程分为五个部分:

  • 第一部分
  • 第二部分
  • 第三部分
  • 第四部分
  • 第五部分

在这里插入图片描述

课程链接

第四部分

评估

构建问答应用

当构建一个复杂的 LLM 应用时,比较重要但又困难的是如何去评价应用的效果。又或者,当我们切换不同的 LLM 模型时,如何去评价模型的优劣。再者,当我们使用不同的向量数据库或参数时,对结果是变好了还是变坏了。接下来,我们将介绍如何来评估 LLM 应用的结果是否正确。

首先,我们创建一条之前使用的问答链。

from langchain.chains import RetrievalQA
from langchain_ollama import ChatOllama
from langchain_community.document_loaders import CSVLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain.indexes import VectorstoreIndexCreator
from langchain.evaluation.qa import QAGenerateChain# Ollama 服务地址
base_url = 'http://localhost:11434'
# 模型名称
llm_model = 'qwen2'
# 测试文件
file_path = 'product.csv'
# 创建模型
llm = ChatOllama(base_url=base_url, model=llm_model)
# 载入测试数据
loader = CSVLoader(file_path=file_path)
data = loader.load()
# 创建嵌入
embeddings = OllamaEmbeddings(base_url=base_url, model=llm_model)
# 创建向量索引
index = VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch,embedding=embeddings
).from_loaders([loader])
# 创建问答链
qa = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=index.vectorstore.as_retriever(),verbose=True,chain_type_kwargs={"document_separator": "<<<<>>>>>"}
)

添加测试数据

我们可以添加一些测试数据,从 product.csv 中选取几条数据,例如第 11 和 12 条是下面这样:

11,高清投影仪,"高亮度,高对比度,支持高清视频播放,适合家庭影院和商务演示。"
12,智能手环,"监测心率、计步、睡眠,智能提醒,是健康生活的好伴侣。"

由于数据由 LLM 自动生成,数据可能都不相同。

我们设置问题并提供答案。这是一个字典 list,每个字典包含 query 和 answer。

examples = [{"query": "高清投影仪支持高清视频播放吗?","answer": "是"},{"query": "哪一款产品能监测心率?","answer": "智能手环"}
]

我们这里创建了两条测试数据,但还不够,手动创建比较费时间,有没有更自动的方式呢?我们可以让大语言模型自己来生成。在 LangChain 中我们可以使用 QAGenerateChain 来让 LLM 自动对每条数据生成测试问题和答案。

# 创建测试集生成链
example_gen_chain = QAGenerateChain.from_llm(llm)
# 生成并解析结果(由于需要调用 LLM,我们这里只取前 5 条)
new_examples = example_gen_chain.apply_and_parse([{"doc": t} for t in data[:5]]
)
print(new_examples[0])

我们查看第一条生成的测试数据,大概像这个样子。我们可以检查每一条生成的测试数据,看是否正确、合适。

{'qa_pairs': {'query': 'What features does the high-definition smart television have?', 'answer': 'The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.'}}

另外,我们可以打开调试模式,看看它是如何运作的。

import langchain
langchain.debug = True

将上述代码放到前面,然后重新运行代码。下面的输出比较长,查看前面主要的部分,我们可以看到 QAGenerateChain 链对每一条数据启动了子链,并生成了提示词,要求 LLM 作为老师,根据下面的数据生成提问和答案。最后按特定的格式输出,然后 LangChain 就可以解析到字典中。

[chain/start] [chain:QAGenerateChain] Entering Chain run with input:
[inputs]
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{"prompts": ["Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 1\nname: 高清智能电视\ndescription: 这款高清智能电视拥有4K超高清分辨率,内置智能系统,支持语音控制,提供丰富的娱乐体验。' metadata={'source': 'product.csv', 'row': 0}\n<End Document>"]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{"prompts": ["Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 2\nname: 多功能料理机\ndescription: 集搅拌、打蛋、榨汁等多种功能于一身,操作简便,是厨房里的得力助手。' metadata={'source': 'product.csv', 'row': 1}\n<End Document>"]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{"prompts": ["Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 3\nname: 无线蓝牙耳机\ndescription: 轻巧舒适,音质清晰,支持长时间续航,适合运动和日常使用。' metadata={'source': 'product.csv', 'row': 2}\n<End Document>"]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{"prompts": ["Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。' metadata={'source': 'product.csv', 'row': 3}\n<End Document>"]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{"prompts": ["Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 5\nname: 便携式榨汁机\ndescription: 小巧便携,操作简便,快速榨汁,适合健康生活需求。' metadata={'source': 'product.csv', 'row': 4}\n<End Document>"]
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.50s] Exiting LLM run with output:
{"generations": [[{"text": "QUESTION: What features does the high-definition smart television have?\nANSWER: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.","generation_info": {"model": "qwen2","created_at": "2024-09-12T02:27:28.132404919Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 15075322258,"load_duration": 4068642657,"prompt_eval_count": 146,"prompt_eval_duration": 3419985000,"eval_count": 48,"eval_duration": 7545190000},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "QUESTION: What features does the high-definition smart television have?\nANSWER: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.","response_metadata": {"model": "qwen2","created_at": "2024-09-12T02:27:28.132404919Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 15075322258,"load_duration": 4068642657,"prompt_eval_count": 146,"prompt_eval_duration": 3419985000,"eval_count": 48,"eval_duration": 7545190000},"type": "ai","id": "run-e2282df6-a2bb-4b75-bd94-c6ee8338b339-0","usage_metadata": {"input_tokens": 146,"output_tokens": 48,"total_tokens": 194},"tool_calls": [],"invalid_tool_calls": []}}}]],"llm_output": null,"run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{"generations": [[{"text": "QUESTION: What is the multifunctional kitchen appliance mentioned in the document capable of doing?\nANSWER: The multifunctional kitchen appliance, known as '多功能料理机', can perform various tasks such as blending, whisking eggs, and juicing. It's designed for ease of operation and serves as a helpful tool in the kitchen.","generation_info": {"model": "qwen2","created_at": "2024-09-12T02:27:40.655928024Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 12512086251,"load_duration": 62599702,"prompt_eval_count": 145,"prompt_eval_duration": 1594234000,"eval_count": 69,"eval_duration": 10853358000},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "QUESTION: What is the multifunctional kitchen appliance mentioned in the document capable of doing?\nANSWER: The multifunctional kitchen appliance, known as '多功能料理机', can perform various tasks such as blending, whisking eggs, and juicing. It's designed for ease of operation and serves as a helpful tool in the kitchen.","response_metadata": {"model": "qwen2","created_at": "2024-09-12T02:27:40.655928024Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 12512086251,"load_duration": 62599702,"prompt_eval_count": 145,"prompt_eval_duration": 1594234000,"eval_count": 69,"eval_duration": 10853358000},"type": "ai","id": "run-db59bd5a-e8c5-4ce4-be93-477b1f7beeeb-0","usage_metadata": {"input_tokens": 145,"output_tokens": 69,"total_tokens": 214},"tool_calls": [],"invalid_tool_calls": []}}}]],"llm_output": null,"run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{"generations": [[{"text": "QUESTION: What are the features of the product with the name \"无线蓝牙耳机\" (Wireless Bluetooth Earphones)?\n\nANSWER: The product named \"无线蓝牙耳机\" offers several features including:\n1. **Lightweight and Comfortable**: The earphones are designed to be lightweight, ensuring comfort during use.\n2. **Crystal Clear Sound Quality**: It provides clear sound quality for an enjoyable listening experience.\n3. **Long Battery Life**: The headphones support a long duration of battery usage, making them suitable for both sports activities and everyday use.\n4. **Versatile Use**: They can be used while exercising or in daily routines without any issues due to their versatile design and functionality.\n\nThese features highlight the product's suitability for users who value convenience, comfort, and audio quality in their listening devices.","generation_info": {"model": "qwen2","created_at": "2024-09-12T02:28:06.427487738Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 25761127075,"load_duration": 63109381,"prompt_eval_count": 139,"prompt_eval_duration": 1397453000,"eval_count": 162,"eval_duration": 24259968000},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "QUESTION: What are the features of the product with the name \"无线蓝牙耳机\" (Wireless Bluetooth Earphones)?\n\nANSWER: The product named \"无线蓝牙耳机\" offers several features including:\n1. **Lightweight and Comfortable**: The earphones are designed to be lightweight, ensuring comfort during use.\n2. **Crystal Clear Sound Quality**: It provides clear sound quality for an enjoyable listening experience.\n3. **Long Battery Life**: The headphones support a long duration of battery usage, making them suitable for both sports activities and everyday use.\n4. **Versatile Use**: They can be used while exercising or in daily routines without any issues due to their versatile design and functionality.\n\nThese features highlight the product's suitability for users who value convenience, comfort, and audio quality in their listening devices.","response_metadata": {"model": "qwen2","created_at": "2024-09-12T02:28:06.427487738Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 25761127075,"load_duration": 63109381,"prompt_eval_count": 139,"prompt_eval_duration": 1397453000,"eval_count": 162,"eval_duration": 24259968000},"type": "ai","id": "run-3dc06185-da4e-4b56-b615-dcf831157fb2-0","usage_metadata": {"input_tokens": 139,"output_tokens": 162,"total_tokens": 301},"tool_calls": [],"invalid_tool_calls": []}}}]],"llm_output": null,"run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{"generations": [[{"text": "QUESTION: What is the product being described and what are its main features?\n\nANSWER: The product being described is a \"智能扫地机器人\" (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.","generation_info": {"model": "qwen2","created_at": "2024-09-12T02:28:17.028896159Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 10589442660,"load_duration": 26054599,"prompt_eval_count": 139,"prompt_eval_duration": 1401741000,"eval_count": 61,"eval_duration": 9159878000},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "QUESTION: What is the product being described and what are its main features?\n\nANSWER: The product being described is a \"智能扫地机器人\" (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.","response_metadata": {"model": "qwen2","created_at": "2024-09-12T02:28:17.028896159Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 10589442660,"load_duration": 26054599,"prompt_eval_count": 139,"prompt_eval_duration": 1401741000,"eval_count": 61,"eval_duration": 9159878000},"type": "ai","id": "run-a489b5fe-7798-41f0-8380-e9bde0e8a889-0","usage_metadata": {"input_tokens": 139,"output_tokens": 61,"total_tokens": 200},"tool_calls": [],"invalid_tool_calls": []}}}]],"llm_output": null,"run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{"generations": [[{"text": "QUESTION: What is the product described in this document?\nANSWER: The product described in this document is a portable juicer named \"便携式榨汁机\" (portable juice extractor). It's characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.","generation_info": {"model": "qwen2","created_at": "2024-09-12T02:28:28.529352164Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 11484086566,"load_duration": 62195060,"prompt_eval_count": 140,"prompt_eval_duration": 1362653000,"eval_count": 68,"eval_duration": 10018610000},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "QUESTION: What is the product described in this document?\nANSWER: The product described in this document is a portable juicer named \"便携式榨汁机\" (portable juice extractor). It's characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.","response_metadata": {"model": "qwen2","created_at": "2024-09-12T02:28:28.529352164Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 11484086566,"load_duration": 62195060,"prompt_eval_count": 140,"prompt_eval_duration": 1362653000,"eval_count": 68,"eval_duration": 10018610000},"type": "ai","id": "run-5709894f-ab18-4a1e-9e7b-0b8acd1eeb6a-0","usage_metadata": {"input_tokens": 140,"output_tokens": 68,"total_tokens": 208},"tool_calls": [],"invalid_tool_calls": []}}}]],"llm_output": null,"run": null
}
[chain/end] [chain:QAGenerateChain] [75.51s] Exiting Chain run with output:
{"outputs": [{"qa_pairs": {"query": "What features does the high-definition smart television have?","answer": "The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience."}},{"qa_pairs": {"query": "What is the multifunctional kitchen appliance mentioned in the document capable of doing?","answer": "The multifunctional kitchen appliance, known as '多功能料理机', can perform various tasks such as blending, whisking eggs, and juicing. It's designed for ease of operation and serves as a helpful tool in the kitchen."}},{"qa_pairs": {"query": "What are the features of the product with the name \"无线蓝牙耳机\" (Wireless Bluetooth Earphones)?","answer": "The product named \"无线蓝牙耳机\" offers several features including:"}},{"qa_pairs": {"query": "What is the product being described and what are its main features?","answer": "The product being described is a \"智能扫地机器人\" (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness."}},{"qa_pairs": {"query": "What is the product described in this document?","answer": "The product described in this document is a portable juicer named \"便携式榨汁机\" (portable juice extractor). It's characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living."}}]
}

接着,我们将手动创建的测试数据和自动创建的合并。

all_examples = examples + [ex['qa_pairs'] for ex in new_examples]

手动评估

我们让 LLM 来回答我们测试数据集中的问题,首先测试第一条手动添加的问题。

response = qa.run(examples[0]["query"])
print(response)

调试模式下的输出类似下面这样。

[chain/start] [chain:RetrievalQA] Entering Chain run with input:
{"query": "高清投影仪支持高清视频播放吗?"
}
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
{"question": "高清投影仪支持高清视频播放吗?","context": "no: 11\nname: 高清投影仪\ndescription: 高亮度,高对比度,支持高清视频播放,适合家庭影院和商务演示。<<<<>>>>>no: 22\nname: 智能跑步机\ndescription: 多种运动模式,智能记录运动数据,适合家庭健身。<<<<>>>>>no: 12\nname: 智能手环\ndescription: 监测心率、计步、睡眠,智能提醒,是健康生活的好伴侣。<<<<>>>>>no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。"
}
[llm/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOllama] Entering LLM run with input:
{"prompts": ["System: Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nno: 11\nname: 高清投影仪\ndescription: 高亮度,高对比度,支持高清视频播放,适合家庭影院和商务演示。<<<<>>>>>no: 22\nname: 智能跑步机\ndescription: 多种运动模式,智能记录运动数据,适合家庭健身。<<<<>>>>>no: 12\nname: 智能手环\ndescription: 监测心率、计步、睡眠,智能提醒,是健康生活的好伴侣。<<<<>>>>>no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。\nHuman: 高清投影仪支持高清视频播放吗?"]
}
[llm/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOllama] [6.70s] Exiting LLM run with output:
{"generations": [[{"text": "是的,高清投影仪支持高清视频播放。","generation_info": {"model": "qwen2","created_at": "2024-09-12T02:45:31.841247748Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 6682410396,"load_duration": 25734266,"prompt_eval_count": 211,"prompt_eval_duration": 5067573000,"eval_count": 12,"eval_duration": 1532113000},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "是的,高清投影仪支持高清视频播放。","response_metadata": {"model": "qwen2","created_at": "2024-09-12T02:45:31.841247748Z","message": {"role": "assistant","content": ""},"done_reason": "stop","done": true,"total_duration": 6682410396,"load_duration": 25734266,"prompt_eval_count": 211,"prompt_eval_duration": 5067573000,"eval_count": 12,"eval_duration": 1532113000},"type": "ai","id": "run-6ef3e8d8-425e-4f61-9c1d-9925a2277e8f-0","usage_metadata": {"input_tokens": 211,"output_tokens": 12,"total_tokens": 223},"tool_calls": [],"invalid_tool_calls": []}}}]],"llm_output": null,"run": null
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] [6.70s] Exiting Chain run with output:
{"text": "是的,高清投影仪支持高清视频播放。"
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain] [6.71s] Exiting Chain run with output:
{"output_text": "是的,高清投影仪支持高清视频播放。"
}
[chain/end] [chain:RetrievalQA] [7.29s] Exiting Chain run with output:
{"result": "是的,高清投影仪支持高清视频播放。"
}
是的,高清投影仪支持高清视频播放。

可以看到,这里使用 stuff 链并生成了提示词,将我们的数据也一并提交给了 LLM,LLM 给出的答案是:是的,高清投影仪支持高清视频播放。答案并不一模一样,但意思是一样的。

让 LLM 自我评估

那如果我们要对所有数据进行测试呢?也需要一条条比对吗?我们也可以让 LLM 来帮助我们做这些。LangChain 提供了 QAEvalChain 链来自动评估结果。

我们可以先关闭调试模式 langchain.debug = False,避免过多的内容输出。

from langchain.evaluation.qa import QAEvalChain# 获得所有测试数据的预测结果
predictions = qa.apply(all_examples)
# 可以使用之前的 LLM 模型,也可以使用一个新的模型
llm = ChatOllama(base_url=base_url, model=llm_model)
# 创建评估链
eval_chain = QAEvalChain.from_llm(llm)
# 获得评估结果
graded_outputs = eval_chain.evaluate(all_examples, predictions)
# 遍历输出结果
for i, eg in enumerate(all_examples):print(f"Example {i}:")print("Question: " + predictions[i]['query'])print("Real Answer: " + predictions[i]['answer'])print("Predicted Answer: " + predictions[i]['result'])print("Predicted Grade: " + graded_outputs[i]['results'])print()

输出类似如下所示。

Example 0:
Question: 高清投影仪支持高清视频播放吗?
Real Answer: 是
Predicted Answer: 是的,高清投影仪支持高清视频播放。
Predicted Grade: CORRECTExample 1:
Question: 哪一款产品能监测心率?
Real Answer: 智能手环
Predicted Answer: 智能手环能监测心率。
Predicted Grade: CORRECTExample 2:
Question: What features does the high-definition smart TV have according to the document?
Real Answer: The high-definition smart TV mentioned in the document has several notable features. It boasts a 4K ultra-high definition resolution, indicating an exceptionally clear picture quality. Additionally, it is equipped with an internal smart system which allows for various interactive functionalities. One of these capabilities includes voice control, suggesting users can operate or navigate through its features using their voice commands. Lastly, the TV offers a rich entertainment experience, implying that it may include access to streaming services, internet connectivity, and other multimedia content options to ensure users enjoy a varied range of programming.
Predicted Answer: I'm sorry, but I don't know the answer because the provided context doesn't mention a "high-definition smart TV". The context includes information about a high-definition projector, an automatic coffee machine, and an intelligent treadmill.
Predicted Grade: INCORRECTExample 3:
Question: What is the product described in this document?
Real Answer: The product described in this document is a "multifunctional kitchen appliance" which combines various functions such as mixing, beating eggs and juicing. It's noted for its ease of use, making it a helpful tool in the kitchen.
Predicted Answer: The document describes several different products:1. 高清投影仪 - A high-definition projector with high brightness and contrast, suitable for home cinema and business presentations.
2. 无线蓝牙耳机 - Wireless Bluetooth headphones that are lightweight, comfortable to wear, have clear sound quality, and offer long battery life, suitable for sports and daily use.
3. 全自动咖啡机 - An automated coffee machine that allows one-button operation and offers multiple coffee flavor choices, providing a professional coffee experience.
4. 智能跑步机 - A smart treadmill with various exercise modes and the ability to record workout data automatically, suitable for home fitness routines.Each product has been characterized by its unique features and application scenarios as detailed in their descriptions.
Predicted Grade: INCORRECTExample 4:
Question: What are the features of the product described in the document?
Real Answer: The product, named "wireless bluetooth headphones," is characterized by being lightweight and comfortable to wear. It offers clear sound quality and supports long-lasting battery life, making it suitable for both sports activities and everyday use.
Predicted Answer: The product described is an "高清投影仪" (High Definition Projector), which features high brightness, high contrast ratio, and support for high-definition video playback. It's suitable for both家庭影院 (home cinema) and 商务演示 (business presentations).Another product mentioned is an "全自动咖啡机" (Fully Automatic Coffee Machine). This machine allows for one-touch operation with a variety of coffee taste choices, providing a professional coffee experience.A third item highlighted is the "智能跑步机" (Smart Treadmill), which offers various exercise modes and can intelligently record workout data. It's ideal for家庭健身 (home fitness).Lastly, there's an "智能扫地机器人" (Smart Vacuum Cleaning Robot) that autonomously plans its cleaning routes, has intelligent obstacle avoidance, frees up hands, and helps keep the home clean.
Predicted Grade: INCORRECTExample 5:
Question: What is the description of the product "智能扫地机器人"?
Real Answer: The description of the product "智能扫地机器人" is that it automatically plans cleaning routes, has intelligent obstacle avoidance, frees up your hands, and keeps the house clean.
Predicted Answer: The description of the product "智能扫地机器人" is: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。
Predicted Grade: CORRECTExample 6:
Question: What is the description of the product '便携式榨汁机'?
Real Answer: The '便携式榨汁机' is described as being small, portable, easy to operate, fast at juicing and suitable for health living needs.
Predicted Answer: I don't know the answer to that question because there is no specific context provided for a '便携式榨汁机' (portable juicer).
Predicted Grade: CORRECT

从上面的输出我们看到,我们这里应该有 7 条测试数据,而每一条数据都输出了 Question(问题),Real Answer(真实回答),Predicted Answer(预测回答) 和 Predicted Grade(预测结果)四行。其中 Real Answer 是先前的 QAGenerateChain 创建的测试集中的答案,而 Predicted Answer 则是由 QAEvalChain 回答的答案,最后的 Predicted Grade 则是两者的匹配结果。上面生成的测试中,部分通过了测试,但是并没有全部通过。

由于两次回答是两条独立的链调用的,因此是互相没有影响的。而我们的问题往往是开放的,没有固定的答案,因此也需要 LLM 来帮助我们判断两次的答案是否是一致的。

这里我们学习了如何使用 LLM 来建立自动的测试链,自动生成测试数据,并自动评估答案。这样就可以方便地生成大批量的测试数据,并快速评估结果。

(未完待续)

下一篇:第五部分

相关文章:

《使用 LangChain 进行大模型应用开发》学习笔记(四)

前言 本文是 Harrison Chase &#xff08;LangChain 创建者&#xff09;和吴恩达&#xff08;Andrew Ng&#xff09;的视频课程《LangChain for LLM Application Development》&#xff08;使用 LangChain 进行大模型应用开发&#xff09;的学习笔记。由于原课程为全英文视频课…...

gbase8s数据库常见的索引扫描方式

1 顺序扫描&#xff08;Sequential scan&#xff09;&#xff1a;数据库服务器按照物理顺序读取表中的所有记录。 常发生在表上无索引或者数据量很少或者一些无法使用索引的sql语句中 2 索引扫描&#xff08;Index scan&#xff09;&#xff1a;数据库服务器读取索引页&#…...

边缘智能-大模型架构初探

R2Cloud接口 机器人注册 请求和应答 注册是一个简单的 HTTP 接口&#xff0c;根据机器人/用户信息注册&#xff0c;创建一个新机器人。 请求 URL URLhttp://ip/robot/regTypePOSTHTTP Version1.1Content-Typeapplication/json 请求参数 Param含义Rule是否必须缺省roboti…...

《python语言程序设计》2018版第8章18题几何circle2D类(上部)

一、利用第7章的内容来做前5个点 第一章之1--从各种角度来测量第一章之2--各种结果第二章之1--建立了针对比对点在圆内的几段第二章之2--利用建立的对比代码&#xff0c;得出的第2点位置 第一章之1–从各种角度来测量 class Circle2D:def __init__(self, x, y, radius):self._…...

nginx upstream转发连接错误情况研究

本次测试用到3台服务器&#xff1a; 192.168.10.115&#xff1a;转发服务器A 192.168.10.209&#xff1a;upstream下服务器1 192.168.10.210&#xff1a;upstream下服务器2 1台客户端&#xff1a;192.168.10.112 服务器A中nginx主要配置如下&#xff1a; log_format main…...

alias 后门从入门到应急响应

目录 1. alias 后门介绍 2. alias 后门注入方式 2.1 方式一(以函数的方式执行) 2.2 方式二(执行python脚本) 3.应急响应 3.1 查看所有连接 3.2 通过PID查看异常连接的进程&#xff0c;以及该进程正在执行的命令行命令 3.3 查看别名 3.4 其他情况 3.5 那么检查这些…...

【远程调用PythonAPI-flask】

文章目录 前言一、Pycharm创建flask项目1.创建虚拟环境2.创建flask项目 二、远程调用PythonAPI——SpringBoot项目集成1.修改PyCharm的host配置2.防火墙设置3.SpringBoot远程调用PythonAPI 前言 解决Pycharm运行Flask指定ip、端口更改无效的问题 首先先创建一个新的flask项目&…...

[今日Arxiv] 思维迭代:利用内心对话进行自主大型语言模型推理

思维迭代&#xff1a;利用内心对话进行自主大型语言模型推理 Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning URL&#xff1a;https://arxiv.org/abs/2409.12618 注&#xff1a;翻译可能存在误差&#xff0c;详细内容建议…...

glTF格式:WebGL应用的3D资产优化解决方案

摘要 glTF作为一种高效的3D资产格式&#xff0c;为WebGL、OpenGL ES和OpenGL运行时的应用提供了强有力的支持。它不仅简化了3D模型的传输与加载流程&#xff0c;还通过优化资产大小&#xff0c;使得打包、解包更加便捷。本文将深入探讨glTF格式的优势&#xff0c;并提供实用的代…...

Unity3D入门(一) : 第一个Unity3D项目,实现矩形自动旋转,并导出到Android运行

1. Unity3D介绍 Unity3D是虚拟现实行业中&#xff0c;使用率较高的一款软件。 它有着强大的功能&#xff0c;是让玩家轻松创建三维视频游戏、建筑可视化、实时三维动画等互动内容的多平台、综合型 虚拟现实开发工具。是一个全面整合的专业引擎。 2. Unity安装 官网 : Unity…...

数据结构与算法——Java实现 8.习题——移除链表元素(值)

祝福你有前路坦途的好运&#xff0c;更祝愿你能保持内心光亮 纵有风雨&#xff0c;依然选择勇敢前行 —— 24.9.22 203. 移除链表元素 给你一个链表的头节点 head 和一个整数 val &#xff0c;请你删除链表中所有满足 Node.val val 的节点&#xff0c;并返回 新的头节点 。 示…...

如何理解MVCC

MVCC是什么&#xff1f; MVCC&#xff0c;是MultiVersion Concurrency Control的缩写&#xff0c;翻译成中文就是多版本并发控制&#xff0c;多个事务同时访问同一数据时&#xff0c;调控每一个事务获取到数据的具体版本。和数据库锁一样&#xff0c;它也是一种并发控制的解决…...

在 Qt 中使用 QLabel 设置 GIF 动态背景

文章目录 在 Qt 中使用 QLabel 设置 GIF 动态背景本文食用注意目标实现步骤1. 准备工作2. 修改头文件 widget.h3. 实现构造函数和析构函数4. 调整背景大小5. 完整代码分析6. 运行程序 总结 在 Qt 中使用 QLabel 设置 GIF 动态背景 在 Qt 中&#xff0c;如果希望在窗口中设置一…...

Flyway 数据库差异处理

Flyway 数据库差异处理详解 在软件开发过程中&#xff0c;数据库 schema 的变更是不可避免的&#xff0c;尤其是在多人协作、多环境部署时&#xff0c;不同环境中的数据库结构可能出现差异。Flyway 作为一个数据库迁移工具&#xff0c;通过版本控制和自动化迁移&#xff0c;确…...

CSS 选择器的分类与使用要点一

目录 非 VIP 用户可前往公众号进行免费阅读 标签选择器 id 选择器 类选择器 介绍 公共类 CSS 中优先用 class 选择器,慎用 id 选择器 后代选择器 交集选择器 以标签名作为开头 以类名作为开头 连续交集 并集选择器(分组选择器) 通配符* 儿子选择器 >(IE7…...

无人机集群路径规划:麻雀搜索算法(Sparrow Search Algorithm, SSA)​求解无人机集群路径规划,提供MATLAB代码

一、单个无人机路径规划模型介绍 无人机三维路径规划是指在三维空间中为无人机规划一条合理的飞行路径&#xff0c;使其能够安全、高效地完成任务。路径规划是无人机自主飞行的关键技术之一&#xff0c;它可以通过算法和模型来确定无人机的航迹&#xff0c;以避开障碍物、优化…...

harbor集成trivy镜像扫描工具

harbor项目地址:GitHub - goharbor/harbor: An open source trusted cloud native registry project that stores, signs, and scans content. 前置条件:安装好docker和docker-compose 一、安装harbor 1、下载harbor安装包并解压 wget https://github.com/goharbor/harbo…...

DMA学习

一、DMA简介 DMA是一种无需CPU的参与就可以让外设与系统内存之间进行双向数据传输的硬件机制。使用DMA可以使系统CPU从实际的I/O数据传输过程中摆脱出来&#xff0c;从而大大提高系统的吞吐率。 DMA方式的数据传输由DMA控制器&#xff08;DMAC&#xff09;控制&#xff0c;在传…...

C语言18--头文件

头文件的作用 通常&#xff0c;一个常规的C语言程序会包含多个源码文件&#xff08;.c&#xff09;&#xff0c;当某些公共资源需要在各个源码文件中使用时&#xff0c;为了避免多次编写相同的代码&#xff0c;一般的做法是将这些大家都需要用到的公共资源放入头文件&#xff…...

vscode软件在 C发中常用插件

一. 简介 本文简单介绍一下&#xff0c;当做 C开发时 vscode软件常用的插件。 vscode软件是 微软公司目前提供的一款免费的开发软件&#xff0c;可以通过 vscode官网下载 vscode。 二. vscode软件在 C开发中常用插件 注意&#xff1a;vscode软件安装后&#xff0c;可以直接…...

k8s从入门到放弃之Ingress七层负载

k8s从入门到放弃之Ingress七层负载 在Kubernetes&#xff08;简称K8s&#xff09;中&#xff0c;Ingress是一个API对象&#xff0c;它允许你定义如何从集群外部访问集群内部的服务。Ingress可以提供负载均衡、SSL终结和基于名称的虚拟主机等功能。通过Ingress&#xff0c;你可…...

连锁超市冷库节能解决方案:如何实现超市降本增效

在连锁超市冷库运营中&#xff0c;高能耗、设备损耗快、人工管理低效等问题长期困扰企业。御控冷库节能解决方案通过智能控制化霜、按需化霜、实时监控、故障诊断、自动预警、远程控制开关六大核心技术&#xff0c;实现年省电费15%-60%&#xff0c;且不改动原有装备、安装快捷、…...

SpringTask-03.入门案例

一.入门案例 启动类&#xff1a; package com.sky;import lombok.extern.slf4j.Slf4j; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.cache.annotation.EnableCach…...

QT3D学习笔记——圆台、圆锥

类名作用Qt3DWindow3D渲染窗口容器QEntity场景中的实体&#xff08;对象或容器&#xff09;QCamera控制观察视角QPointLight点光源QConeMesh圆锥几何网格QTransform控制实体的位置/旋转/缩放QPhongMaterialPhong光照材质&#xff08;定义颜色、反光等&#xff09;QFirstPersonC…...

宇树科技,改名了!

提到国内具身智能和机器人领域的代表企业&#xff0c;那宇树科技&#xff08;Unitree&#xff09;必须名列其榜。 最近&#xff0c;宇树科技的一项新变动消息在业界引发了不少关注和讨论&#xff0c;即&#xff1a; 宇树向其合作伙伴发布了一封公司名称变更函称&#xff0c;因…...

如何应对敏捷转型中的团队阻力

应对敏捷转型中的团队阻力需要明确沟通敏捷转型目的、提升团队参与感、提供充分的培训与支持、逐步推进敏捷实践、建立清晰的奖励和反馈机制。其中&#xff0c;明确沟通敏捷转型目的尤为关键&#xff0c;团队成员只有清晰理解转型背后的原因和利益&#xff0c;才能降低对变化的…...

AI语音助手的Python实现

引言 语音助手(如小爱同学、Siri)通过语音识别、自然语言处理(NLP)和语音合成技术,为用户提供直观、高效的交互体验。随着人工智能的普及,Python开发者可以利用开源库和AI模型,快速构建自定义语音助手。本文由浅入深,详细介绍如何使用Python开发AI语音助手,涵盖基础功…...

在鸿蒙HarmonyOS 5中使用DevEco Studio实现指南针功能

指南针功能是许多位置服务应用的基础功能之一。下面我将详细介绍如何在HarmonyOS 5中使用DevEco Studio实现指南针功能。 1. 开发环境准备 确保已安装DevEco Studio 3.1或更高版本确保项目使用的是HarmonyOS 5.0 SDK在项目的module.json5中配置必要的权限 2. 权限配置 在mo…...

Linux安全加固:从攻防视角构建系统免疫

Linux安全加固:从攻防视角构建系统免疫 构建坚不可摧的数字堡垒 引言:攻防对抗的新纪元 在日益复杂的网络威胁环境中,Linux系统安全已从被动防御转向主动免疫。2023年全球网络安全报告显示,高级持续性威胁(APT)攻击同比增长65%,平均入侵停留时间缩短至48小时。本章将从…...

【Qt】控件 QWidget

控件 QWidget 一. 控件概述二. QWidget 的核心属性可用状态&#xff1a;enabled几何&#xff1a;geometrywindows frame 窗口框架的影响 窗口标题&#xff1a;windowTitle窗口图标&#xff1a;windowIconqrc 机制 窗口不透明度&#xff1a;windowOpacity光标&#xff1a;cursor…...