HippoRAG 2 原理精读
提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
文章目录
- 整体流程
- 离线索引阶段
- 在线检索和问答阶段
- 总结
整体流程

从上图可以看出,整个流程分为两个阶段
1、离线索引阶段
2、在线检索和问答阶段
离线索引阶段
1、对文档进行分段
2、抽取实体
对应的prompt
ner_system = """Your task is to extract named entities from the given paragraph.
Respond with a JSON list of entities.
"""one_shot_ner_paragraph = """Radio City
Radio City is India's first private FM radio station and was started on 3 July 2001.
It plays Hindi, English and regional songs.
Radio City recently forayed into New Media in May 2008 with the launch of a music portal - PlanetRadiocity.com that offers music related news, videos, songs, and other music-related features."""one_shot_ner_output = """{"named_entities":["Radio City", "India", "3 July 2001", "Hindi", "English", "May 2008", "PlanetRadiocity.com"]
}
"""prompt_template = [{"role": "system", "content": ner_system},{"role": "user", "content": one_shot_ner_paragraph},{"role": "assistant", "content": one_shot_ner_output},{"role": "user", "content": "${passage}"}
]
LLM的输出如
{"named_entities": ["迦楼罗","大鹏金翅鸟","岳飞"]
}
3、根据实体和文本生成三元组
from .ner import one_shot_ner_paragraph, one_shot_ner_output
from ...utils.llm_utils import convert_format_to_templatener_conditioned_re_system = """Your task is to construct an RDF (Resource Description Framework) graph from the given passages and named entity lists.
Respond with a JSON list of triples, with each triple representing a relationship in the RDF graph. Pay attention to the following requirements:
- Each triple should contain at least one, but preferably two, of the named entities in the list for each passage.
- Clearly resolve pronouns to their specific names to maintain clarity."""ner_conditioned_re_frame = """Convert the paragraph into a JSON dict, it has a named entity list and a triple list.
Paragraph:
{passage}
{named_entity_json}
"""ner_conditioned_re_input = ner_conditioned_re_frame.format(passage=one_shot_ner_paragraph, named_entity_json=one_shot_ner_output)ner_conditioned_re_output = """{"triples": [["Radio City", "located in", "India"],["Radio City", "is", "private FM radio station"],["Radio City", "started on", "3 July 2001"],["Radio City", "plays songs in", "Hindi"],["Radio City", "plays songs in", "English"],["Radio City", "forayed into", "New Media"],["Radio City", "launched", "PlanetRadiocity.com"],["PlanetRadiocity.com", "launched in", "May 2008"],["PlanetRadiocity.com", "is", "music portal"],["PlanetRadiocity.com", "offers", "news"],["PlanetRadiocity.com", "offers", "videos"],["PlanetRadiocity.com", "offers", "songs"]]
}
"""prompt_template = [{"role": "system", "content": ner_conditioned_re_system},{"role": "user", "content": ner_conditioned_re_input},{"role": "assistant", "content": ner_conditioned_re_output},{"role": "user", "content": convert_format_to_template(original_string=ner_conditioned_re_frame, placeholder_mapping=None, static_values=None)}
]
这一步会根据上面抽取出来的实体、原始文本内容生成三元组。
格式如:[主语, 谓语, 宾语]
4、对原始文本、三元组、实体进行向量化
使用的是 nvidia/NV-Embed-v2 模型,参数量为7B
对不同类型的数据,使用的prompt不同,如下:
instructions = {'ner_to_node': 'Given a phrase, retrieve synonymous or relevant phrases that best match this phrase.','query_to_node': 'Given a question, retrieve relevant phrases that are mentioned in this question.','query_to_fact': 'Given a question, retrieve relevant triplet facts that matches this question.','query_to_sentence': 'Given a question, retrieve relevant sentences that best answer the question.','query_to_passage': 'Given a question, retrieve relevant documents that best answer the question.',
}
5、增加同义词边
连接语义相近的实体
在线检索和问答阶段
1、检索topn 的三元组,用LLM对检索结果过滤无关三元组
* 如果没有检索到三元组,则直接检索相关文档
* 如果检索到三元组,则从构建的图中基于个性化的pagerank算法,检索出topn的相关文档
2、利用LLM基于检索结果回答
对应prompt
one_shot_ircot_demo_docs = ("""Wikipedia Title: Milk and Honey (album)\nMilk and Honey is an album by John Lennon and Yoko Ono released in 1984. Following the compilation "The John Lennon Collection", it is Lennon's eighth and final studio album, and the first posthumous release of new Lennon music, having been recorded in the last months of his life during and following the sessions for their 1980 album "Double Fantasy". It was assembled by Yoko Ono in association with the Geffen label.\n\n""""""Wikipedia Title: John Lennon Museum\nJohn Lennon Museum (ジョン・レノン・ミュージアム , Jon Renon Myūjiamu ) was a museum located inside the Saitama Super Arena in Chūō-ku, Saitama, Saitama Prefecture, Japan. It was established to preserve knowledge of John Lennon's life and musical career. It displayed Lennon's widow Yoko Ono's collection of his memorabilia as well as other displays. The museum opened on October 9, 2000, the 60th anniversary of Lennon’s birth, and closed on September 30, 2010, when its exhibit contract with Yoko Ono expired. A tour of the museum began with a welcoming message and short film narrated by Yoko Ono (in Japanese with English headphones available), and ended at an avant-garde styled "reflection room" full of chairs facing a slide show of moving words and images. After this room there was a gift shop with John Lennon memorabilia available.\n\n""""""Wikipedia Title: Walls and Bridges\nWalls and Bridges is the fifth studio album by English musician John Lennon. It was issued by Apple Records on 26 September 1974 in the United States and on 4 October in the United Kingdom. Written, recorded and released during his 18-month separation from Yoko Ono, the album captured Lennon in the midst of his "Lost Weekend". "Walls and Bridges" was an American "Billboard" number-one album and featured two hit singles, "Whatever Gets You thru the Night" and "#9 Dream". The first of these was Lennon's first number-one hit in the United States as a solo artist, and his only chart-topping single in either the US or Britain during his lifetime.\n\n""""""Wikipedia Title: Nobody Loves You (When You're Down and Out)\n"Nobody Loves You (When You're Down and Out)" is a song written by John Lennon released on his 1974 album "Walls and Bridges". The song is included on the 1986 compilation "Menlove Ave.", the 1990 boxset "Lennon", the 1998 boxset "John Lennon Anthology", the 2005 two-disc compilation "", and the 2010 boxset "Gimme Some Truth".\n\n""""""Wikipedia Title: Give Peace a Chance\n"Give Peace a Chance" is an anti-war song written by John Lennon (credited to Lennon–McCartney), and performed with Yoko Ono in Montreal, Quebec, Canada. Released as a single in 1969 by the Plastic Ono Band on Apple Records (catalogue Apple 13 in the United Kingdom, Apple 1809 in the United States), it is the first solo single issued by Lennon, released when he was still a member of the Beatles, and became an anthem of the American anti-war movement during the 1970s. It peaked at number 14 on the "Billboard" Hot 100 and number 2 on the British singles chart.\n"""
)one_shot_ircot_demo = (f'{one_shot_ircot_demo_docs}''\n\nQuestion: 'f"Nobody Loves You was written by John Lennon and released on what album that was issued by Apple Records, and was written, recorded, and released during his 18 month separation from Yoko Ono?"'\nThought: 'f"The album issued by Apple Records, and written, recorded, and released during John Lennon's 18 month separation from Yoko Ono is Walls and Bridges. Nobody Loves You was written by John Lennon on Walls and Bridges album. So the answer is: Walls and Bridges."'\n\n'
)ircot_system = ('You serve as an intelligent assistant, adept at facilitating users through complex, multi-hop reasoning across multiple documents. This task is illustrated through demonstrations, each consisting of a document set paired with a relevant question and its multi-hop reasoning thoughts. Your task is to generate one thought for current step, DON\'T generate the whole thoughts at once! If you reach what you believe to be the final step, start with "So the answer is:".''\n\n'f'{one_shot_ircot_demo}'
)prompt_template = [{"role": "system", "content": ircot_system},{"role": "user", "content": "${prompt_user}"}
]
总结
1、只是用三元组来协助检索,并没有利用图
2、难以相信这种做法能超越普通RAG几十个点
参考:
HippoRAG 2: From RAGtoMemory: Non-Parametric Continual Learning for Large Language Models
相关文章:
HippoRAG 2 原理精读
提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档 文章目录 整体流程离线索引阶段在线检索和问答阶段 总结 整体流程 从上图可以看出,整个流程分为两个阶段 1、离线索引阶段 2、在线检索和问答阶段 离线索引阶段…...
三:FFMPEG拉流读取模块的讲解
FFMPEG拉流读取模块在远程监控项目最核心的作用是读取UVC摄像头传输的H264码流,并对其码流进行帧的提取,提取完成之后则把数据传输到VDEC解码模块进行解码。而在我们这个项目中,UVC推流的功能由FFMPEG的命令完成。 FFMPEG拉流读取模块的API…...
linux makefile tutorial
一个makefile的教程,几个小时就能看完,对makefile有个总体加细节的系统了解,非常不错: Learn Makefiles With the tastiest examples 中文翻译版: 起步 - Makefile 教程 (gavinliu6.github.io) gcc官网手册&#x…...
【从零开始学习计算机科学】操作系统(五)处理器调度
【从零开始学习计算机科学】操作系统(五)处理器调度 处理器调度一些简单的短程调度算法的思路先来先服务(First-Come-First-Served,FCFS)优先级调度及其变种最短作业优先调度算法(SJF)--非抢占式最短作业优先调度算法(SJF)--抢占式最高响应比优先调度算法轮转调度算法…...
视觉图像处理
在MATLAB中进行视觉图像处理仿真通常涉及图像增强、滤波、分割、特征提取等操作。以下是一个分步指南和示例代码,帮助您快速入门: 1. MATLAB图像处理基础步骤 1.1 读取和显示图像 % 读取图像(替换为实际文件路径) img = imread(lena.jpg); % 显示原图 figure; subplot(2…...
从零开始设计一个完整的网站:HTML、CSS、PHP、MySQL 和 JavaScript 实战教程
前言 本文将从实战角度出发,带你一步步设计一个完整的网站。我们将从 静态网页 开始,然后加入 动态功能(使用 PHP),连接 数据库,最后加入 JavaScript 实现交互功能。通过这个教程,你将掌握一个…...
JavaScript(Web APIs)
这个阶段两天也能看完 目录 壹_DOM-获取元素 00、获取DOM元素(根据CS选择器来获取DOM元素) 01、修改元素内容 02、修改CSS 03、H5自定义属性 04、定时器 贰_DOM-事件基础 00、事件监听 01、事件类型 02、事件对象 03、环境对象 04、回调函数 叁_DOM-事…...
Global top sap abap 和deepseek对话,测试其abap推理能力
我提交给deepseek一段代码 FUNCTION zXXX_hr_pafm_pannnn_up. *"---------------------------------------------------------------------- *"*"Local Interface: *" IMPORTING *" VALUE(IS_PRELP) TYPE PRELP OPTIONAL *" VALUE(IV…...
Android DUKPT - 3DES
一、DUKPT概述 DUKPT 即Derived Unique Key Per Transaction(每个事务的派生唯一密钥)。ANSI X9.24规范定义的密钥管理体系,主要用于对称密钥加密场景(如MAC、PIN等敏感数据保护)。通过动态生成唯一交易密钥ÿ…...
机器学习数学基础:45.多重响应分析
多重响应分析超详细教程:手把手教你分析多选题数据 一、深入理解多重响应分析的背景 问卷调查中,问题分为单选题与多选题: 单选题:如“你的性别?1.男 2.女”,答题者仅选一个选项,分析时直接统…...
《苍穹外卖》SpringBoot后端开发项目核心知识点与常见问题整理(DAY1 to DAY3)
目录 一、在本地部署并启动Nginx服务1. 解压Nginx压缩包2. 启动Nginx服务3. 验证Nginx是否启动成功: 二、导入接口文档1. 黑马程序员提供的YApi平台2. YApi Pro平台3. 推荐工具:Apifox 三、Swagger1. 常用注解1.1 Api与ApiModel1.2 ApiModelProperty与Ap…...
企业安全—对数据和资产进行识别和分类
0x00 前言 针对数据和资产的保护刻不容缓,这个是每一个做企业安全建设不容放过的一环,那么在识别数据和资产已经对这些数据分类就是必须要了解和掌握的内容。 这里不仅是针对商业机密,还有用户数据,前者在于保护公司利益&#x…...
QT系列教程(20) Qt 项目视图便捷类
视频连接 https://www.bilibili.com/video/BV1XY41127t3/?vd_source8be9e83424c2ed2c9b2a3ed1d01385e9 Qt项目视图便捷类 Qt项目视图提供了一些便捷类,包括QListWidget, QTableWidget, QTreeWidget等。我们分别介绍这几个便捷类。 我们先创建一个Qt …...
Spring Boot 调用DeepSeek API的详细教程
目录 前置准备步骤1:创建Spring Boot项目步骤2:配置API参数步骤3:创建请求/响应DTO步骤4:实现API客户端步骤5:创建控制器步骤6:异常处理步骤7:测试验证单元测试示例Postman测试请求 常见问题排查…...
动态扩缩容引发的JVM堆内存震荡:从原理到实践的GC调优指南
目录 一、典型案例:系统发布后的GC雪崩事件 (一)故障现象 1. 刚刚启动时 GC 次数较多 2. 堆内存锯齿状波动 3. GC日志特征:Allocation Failure (二)问题定位 二、原理深度解析:JVM内存弹…...
AI智能眼镜主控芯片:技术演进与产业生态的深度解析
一、AI智能眼镜的技术挑战与主控芯片核心诉求 AI智能眼镜作为XR(扩展现实)技术的代表产品,其核心矛盾在于性能、功耗与体积的三角平衡。主控芯片作为设备的“大脑”,需在有限空间内实现复杂计算、多模态交互与全天候续航…...
微服务拆分-远程调用
我们在查询购物车列表的时候,它有一个需求,就是不仅仅要查出购物车当中的这些商品信息,同时还要去查到购物车当中这些商品的最新的价格和状态信息,跟购物车当中的快照进行一个对比,从而去提醒用户。 现在我们已经做了服…...
[网络爬虫] 动态网页抓取 — Selenium 介绍 环境配置
🌟想系统化学习爬虫技术?看看这个:[数据抓取] Python 网络爬虫 - 学习手册-CSDN博客 0x01:Selenium 工具介绍 Selenium 是一个开源的便携式自动化测试工具。它最初是为网站自动化测试而开发的,类似于我们玩游戏用的按…...
【RAGFlow】windows本地pycharm运行
原因 由于官方只提供了docker部署,基于开源代码需要实现自己内部得逻辑,所以需要本地pycharm能访问,且docker运行依赖得其余组件,均需要使用开发服务器得配置。 修改过程 安装python 项目依赖于Python 版本:>3.1…...
git子仓库管理的两种方式
在团队协作中选择使用 Git Submodule 还是 Git Subtree 取决于项目的需求和团队的工作方式。以下是两者的对比和适用场景分析,帮助你做出选择: Git Submodule 优点 独立性高 子模块是一个独立的仓库,拥有自己的提交历史和分支。这使得子模…...
树莓派5首次开机保姆级教程(无显示器通过VNC连接树莓派桌面)
第一次开机详细步骤 步骤一:树莓派系统烧录1 搜索打开烧录软件“Raspberry Pi Imager”2 选择合适的设备、系统、SD卡3 烧录配置选项 步骤二:SSH远程树莓派1 树莓派插电2 网络连接(有线或无线)3 确定树莓派IP地址 步骤三ÿ…...
html-表格标签
一、表格标签 1. 表格的主要作用 表格主要用于显示、展示数据,因为它可以让数据显示的非常的规整,可读性非常好。特别是后台展示数据 的时候,能够熟练运用表格就显得很重要。一个清爽简约的表格能够把繁杂的数据表现得很有条理。 总…...
大模型安全新范式:DeepSeek一体机内容安全卫士发布
2月以来,DeepSeek一体机几乎成为了政企市场AI消费的最强热点。 通过一体机的方式能够缩短大模型部署周期,深度结合业务场景,降低中小企业对于大模型的使用门槛。据不完全统计,已约有超过60家企业基于DeepSeek推出一体机产品。 但…...
数据分析绘制随时间顺序变化图加入线性趋势线——numpy库的polyfit计算一次多项式拟合
import pandas as pd import numpy as np import matplotlib.pyplot as plt# 导入数据 data pd.read_csv(rC:\Users\11712\notebooktrain1.csv)# 假设数据包含 date_time 和 speed 列 data[date_time] pd.to_datetime(data[date_time]) # 确保时间列是 datetime 类型 data.s…...
密闭空间可燃气体监测终端:守护城市命脉,智驭燃气安全!
近年来,陕西省高度重视燃气安全,出台了一系列政策文件,旨在全面加强城镇燃气安全监管,防范化解重大安全风险。2023年,陕西省安委会印发《全省城镇燃气安全专项整治工作方案》,明确要求聚焦燃气经营、输送配…...
阿里千问大模型(Qwen2.5-VL-7B-Instruct)部署
参考链接 知乎帖子 B站视频 huggingface 镜像网站(不太全,比如 Qwen/Qwen2.5-VL-7B-Instruct就没有) huggingface 5种下载方式汇总 通过huggingface-cli下载模型 不一样的部分是预训练权重的下载和demo 首先安装huggingface_hub pip insta…...
【人工智能】随机森林的智慧:集成学习的理论与实践
随机森林(Random Forest)是一种强大的集成学习算法,通过构建多棵决策树并结合投票或平均预测提升模型性能。本文深入探讨了随机森林的理论基础,包括决策树的构建、Bagging方法和特征随机选择机制,并通过LaTeX公式推导其偏差-方差分解和误差分析。接着,我们详细描述了随机…...
Javascript基础语法详解
面向对象的语言.脚本语言,不需要编译,浏览器解释即可运行 .用于控制网页的行为.浏览器的source可以打断点调试, console输入代码可以执行 use strict指令: 在“严格模式”下运行js代码, 防止意外创建全局变量等, 提高代码安全性和执行效率. 使用: 全局严格模式:…...
前端状态管理 pinia和vuex高频面试题
前端状态管理 Pinia 和 Vuex 是 Vue 生态中常用的状态管理方案,在面试中经常涉及 基本概念、对比、最佳实践、性能优化 等多个方面。以下是 高频面试题 详细答案,共 20 题,助你轻松应对面试!🚀 🔥 基础概念…...
【Go学习实战】03-3-文章评论及写文章
【Go学习实战】03-3-文章评论及写文章 文章评论注册valine获取凭证加载评论页面 写文章修改cdn位置完善功能查看页面 发布文章POST发布文章发布文章测试 查询文章详情查询详情测试 修改文章修改文章测试 写文章图片上传前端后端逻辑测试 文章评论 这里我们的博客因为是个轻量级…...
