Langchain 集成 FAISS
Langchain 集成 FAISS
- 1. FAISS
- 2. Similarity Search with score
- 3. Saving and loading
- 4. Merging
- 5. Similarity Search with filtering
1. FAISS
Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集,甚至可能无法容纳在 RAM 中的向量集。它还包含用于评估和参数调整的支持代码。
Faiss 文档地址在这里.
本笔记本展示了如何使用与 FAISS 矢量数据库相关的功能。
示例代码,
# !pip install faiss
# OR
# !pip install faiss-cpu
import os
import getpassos.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")# 如果需要在没有 AVX2 优化的情况下初始化 FAISS,请取消注释以下一行
# os.environ['FAISS_NO_AVX2'] = '1'
# from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
输出结果,
from langchain.document_loaders import TextLoaderloader = TextLoader("./state_of_the_union_en.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)# embeddings = OpenAIEmbeddings
embeddings = CohereEmbeddings()
示例代码,
db = FAISS.from_documents(docs, embeddings)query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
输出结果,
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
2. Similarity Search with score
有一些 FAISS 特定方法。其中之一是 similarity_search_with_score ,它不仅允许您返回文档,还允许返回查询到它们的距离分数。返回的距离分数是L2距离。因此,分数越低越好。
示例代码,
docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores[0]
输出结果,
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),7172.888)
refer: https://python.langchain.com/docs/integrations/vectorstores/faiss 文档的分数是 0.36913747,
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),0.36913747)
还可以使用 similarity_search_by_vector 搜索与给定嵌入向量类似的文档,它接受嵌入向量作为参数而不是字符串。
示例代码,
embedding_vector = embeddings.embed_query(query)
docs_and_scores = db.similarity_search_by_vector(embedding_vector)
docs_and_scores
输出结果如下,
[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together. \n\nFirst, beat the opioid epidemic.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. \n\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \n\nThat ends on my watch. \n\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \n\nWe’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \n\nLet’s pass the Paycheck Fairness Act and paid leave. \n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.', metadata={'source': './state_of_the_union_en.txt'})]
3. Saving and loading
您还可以保存和加载 FAISS 索引。这很有用,因此您不必每次使用它时都重新创建它。
示例代码,
db.save_local("faiss_index")
new_db = FAISS.load_local("faiss_index", embeddings)
docs = new_db.similarity_search(query)
docs[0]
输出结果,
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'})
4. Merging
您还可以合并两个 FAISS 矢量存储。
示例代码,
db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db2.docstore._dict
输出结果,
{'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}
示例代码,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db1.merge_from(db2)
输出结果,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={}),'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}
5. Similarity Search with filtering
FAISS vectorstore 还可以支持过滤,因为 FAISS 本身不支持过滤,我们必须手动执行。这是通过首先获取比 k 更多的结果然后过滤它们来完成的。您可以根据元数据过滤文档。您还可以在调用任何搜索方法时设置 fetch_k 参数,以设置在过滤之前要获取的文档数量。这是一个小例子:
示例代码,
from langchain.schema import Documentlist_of_documents = [Document(page_content="foo", metadata=dict(page=1)),Document(page_content="bar", metadata=dict(page=1)),Document(page_content="foo", metadata=dict(page=2)),Document(page_content="barbar", metadata=dict(page=2)),Document(page_content="foo", metadata=dict(page=3)),Document(page_content="bar burr", metadata=dict(page=3)),Document(page_content="foo", metadata=dict(page=4)),Document(page_content="bar bruh", metadata=dict(page=4)),
]
db = FAISS.from_documents(list_of_documents, embeddings)
results_with_scores = db.similarity_search_with_score("foo")
for doc, score in results_with_scores:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
输出结果,
Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 2}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 3}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 4}, Score: 0.018019594252109528
现在我们进行相同的查询调用,但我们仅过滤 page = 1,
results_with_scores = db.similarity_search_with_score("foo", filter=dict(page=1))
for doc, score in results_with_scores:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
输出结果,
Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: bar, Metadata: {'page': 1}, Score: 10266.8544921875
同样的事情也可以用 max_marginal_relevance_search 来完成。
示例代码,
results = db.max_marginal_relevance_search("foo", filter=dict(page=1))
for doc in results:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
输出结果,
Content: foo, Metadata: {'page': 1}
Content: bar, Metadata: {'page': 1}
以下是调用 similarity_search 时如何设置 fetch_k 参数的示例。通常您需要 fetch_k 参数 >> k 参数。这是因为 fetch_k 参数是过滤之前将获取的文档数。如果将 fetch_k 设置为较小的数字,您可能无法获得足够的文档进行过滤。
示例代码,
results = db.similarity_search("foo", filter=dict(page=1), k=1, fetch_k=4)
for doc in results:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
输出结果,
Content: foo, Metadata: {'page': 1}
完结!
相关文章:
Langchain 集成 FAISS
Langchain 集成 FAISS 1. FAISS2. Similarity Search with score3. Saving and loading4. Merging5. Similarity Search with filtering 1. FAISS Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集&a…...
科技与人元宇宙论坛跨界对话
近来,“元宇宙”成为热门话题,越来越频繁地出现在人们的视野里。大家都在谈论它,但似 乎还没有一个被所有人认同的定义。元宇宙究竟是什么?未来它会对我们的工作和生活带来什么样 的改变?当谈论虚拟现实(VR…...
JAVA-生成二维码图片
使用hutool工具包,主动一个简单方便,pom添加依赖 <dependency><groupId>cn.hutool</groupId><artifactId>hutool-all</artifactId><version>5.8.12</version> </dependency> 直接上代码 //设置像素宽高 QrConfig config new…...
【iOS】iOS持久化
文章目录 一. 数据持久化的目的二. iOS中数据持久化方案三. 数据持有化方式的分类1. 内存缓存2. 磁盘缓存SDWebImage缓存 四. 沙盒机制的介绍五. 沙盒目录结构1. 获取应用程序的沙盒路径2. 访问沙盒目录常用C函数介绍3. 沙盒目录介绍 六. 持久化数据存储方式1. XML属性列表2. P…...
基于Javaweb+Vue3实现淘宝卖鞋前后端分离项目
前端技术栈:HTMLCSSJavaScriptVue3 后端技术栈:JavaSEMySQLJDBCJavaWeb 文章目录 前言1️⃣登录功能登录后端登录前端 2️⃣商家管理查询商家查询商家后端查询商家前端 增加商家增加商家后端增加商家前端 删除商家删除商家后端删除商家前端 修改商家修改…...
bat一键批量、有序启动jar
将脚本文件后缀改为 bat,脚本文件和 jar 包放在同一个目录 echo offstart cmd /c "java -jar register.jar " ping 192.0.2.2 -n 1 -w 10000 > nulstart cmd /c "java -jar admin.jar " ping 192.0.2.2 -n 1 -w 30000 > nulstart cmd /c…...
centos7安装mysql数据库详细教程及常见问题解决
mysql数据库详细安装步骤 1.在root身份下输入执行命令: yum -y update 2.检查是否已经安装MySQL,输入以下命令并执行: mysql -v 如出现-bash: mysql: command not found 则说明没有安装mysql 也可以输入rpm -qa | grep -i mysql 查看是否已…...
C++ STL sort函数的底层实现
C STL sort函数的底层实现 sort函数的底层用到的是内省式排序以及插入排序,内省排序首先从快速排序开始,当递归深度超过一定深度(深度为排序元素数量的对数值)后转为堆排序。 先来回顾一下以上提到的3中排序方法: 快…...
ICP算法和优化问题详细公式推导
1. 介绍 ICP(Iterative Closest Point):求一组平移和旋转使得两个点云之间重合度尽可能高。 2. 算法流程 找最近邻关联点,求解 R , t R , t R , t R , t R,tR,tR,tR,t R,tR,tR,tR,t,如此反复直到重合程度足够高。 3. 数学描述 X { x 1 ,…...
【安全狗】linux免费服务器防护软件安全狗详细安装教程
在费用有限的基础上,复杂密码云服务器基础防护常见端口替换安全软件,可以防护绝大多数攻击 第一步:下载服务器安全狗Linux版(下文以64位版本为例) 官方提供了两个下载方式,本文采用的是 方式2 wget安装 方…...
【iOS】自定义字体
文章目录 前言一、下载字体二、添加字体三、检查字体四、使用字体 前言 在设计App的过程中我们常常会想办法去让我们的界面变得美观,使用好看的字体是我们美化界面的一个方法。接下来笔者将会讲解App中添加自定义字体 一、下载字体 我们要使用自定义字体&#x…...
WPF实战学习笔记06-设置待办事项界面
设置待办事项界面 创建待办待办事项集合并初始化 TodoViewModel: using Mytodo.Common.Models; using Prism.Commands; using Prism.Mvvm; using System; using System.Collections.Generic; using System.Collections.ObjectModel; using System.Linq; using Sy…...
推荐几个不错的免费配色工具网站
1. Paletton专业的配色套件,提供色轮理论及调色功能。可查看配色预览效果。 网站:http://paletton.com 2. Colormind一个基于机器学习的智能配色工具。可以一键生成配色方案。 网站:http://colormind.io 3. Adobe ColorAdobe官方的配色工具,可以从图片中取色,也可以随机生成配色…...
gitee page发布的静态网站,无法播放目录中的mp4视频
起因是希望在gitee上部署静态网站,利用three.js VideoTexture 环境贴图播放视频。 但是试了多几次 mp4均提示404,资源无法获取; 找了很多方案,最后发现将视频转为ogv 就可以完美适配了; mp4转ogv 附threejs使用ogv进…...
opencv-26 图像几何变换04- 重映射-函数 cv2.remap()
什么是重映射? 重映射(Remapping)是图像处理中的一种操作,用于将图像中的像素从一个位置映射到另一个位置。重映射可以实现图像的平移、旋转、缩放和透视变换等效果。它是一种基于像素级的图像变换技术,可以通过定义映…...
SkyWalking链路追踪中span全解
基本概念 在SkyWalking链路追踪中,Span(跨度)是Trace(追踪)的组成部分之一。Span代表一次调用或操作的单个组件,可以是一个方法调用、一个HTTP请求或者其他类型的操作。 每个Span都包含了一些关键的信息&am…...
【前端知识】React 基础巩固(三十一)——Redux的简介
React 基础巩固(三十一)——Redux 一、Redux是个纯函数 概念 纯函数(确定的输入一定产生确定的输出,函数在执行过程中不产生副作用): 在程序设计中,若一个函数符合以下条件,那么这个函数就被称为纯函数…...
拦截Bean使用之前各个时机的Spring组件
拦截Bean使用之前各个时机的Spring组件 之前使用过的BeanPostProcessor就是在Bean实例化之后,注入属性值之前的时机。 Spring Bean的生命周期本次演示的是在Bean实例化之前的时机,使用BeanFactoryPostProcessor进行验证,以及在加载Bean之前进…...
RT thread 之 Nand flash 读写过程分析
文章目录 前言:什么是Nand Flash?1、Nand Flash 读取步骤2、从主存读到Cache2.1 在标准spi接口下读取过程2.2 测试时序(SPI频率30MHz) 3.从Cache读取数据3.1在标准spi接口读取过程测试时序 前言:什么是Nand Flash&…...
独立站最全出单营销指南,新手卖家赶紧学起来吧!
这是一个需要投入大量时间和精力的挑战,但只有经过筛选在众多品牌和渠道中找到最适合自己的营销策略,才能成功。 新手商家经常会发现自己有很多可以改进的地方:品牌的颜色、字体以及其他一些细节。但真正走向成熟的商家会意识到,…...
51c自动驾驶~合集58
我自己的原文哦~ https://blog.51cto.com/whaosoft/13967107 #CCA-Attention 全局池化局部保留,CCA-Attention为LLM长文本建模带来突破性进展 琶洲实验室、华南理工大学联合推出关键上下文感知注意力机制(CCA-Attention),…...
在rocky linux 9.5上在线安装 docker
前面是指南,后面是日志 sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo sudo dnf install docker-ce docker-ce-cli containerd.io -y docker version sudo systemctl start docker sudo systemctl status docker …...
第25节 Node.js 断言测试
Node.js的assert模块主要用于编写程序的单元测试时使用,通过断言可以提早发现和排查出错误。 稳定性: 5 - 锁定 这个模块可用于应用的单元测试,通过 require(assert) 可以使用这个模块。 assert.fail(actual, expected, message, operator) 使用参数…...
听写流程自动化实践,轻量级教育辅助
随着智能教育工具的发展,越来越多的传统学习方式正在被数字化、自动化所优化。听写作为语文、英语等学科中重要的基础训练形式,也迎来了更高效的解决方案。 这是一款轻量但功能强大的听写辅助工具。它是基于本地词库与可选在线语音引擎构建,…...
云原生玩法三问:构建自定义开发环境
云原生玩法三问:构建自定义开发环境 引言 临时运维一个古董项目,无文档,无环境,无交接人,俗称三无。 运行设备的环境老,本地环境版本高,ssh不过去。正好最近对 腾讯出品的云原生 cnb 感兴趣&…...
Spring是如何解决Bean的循环依赖:三级缓存机制
1、什么是 Bean 的循环依赖 在 Spring框架中,Bean 的循环依赖是指多个 Bean 之间互相持有对方引用,形成闭环依赖关系的现象。 多个 Bean 的依赖关系构成环形链路,例如: 双向依赖:Bean A 依赖 Bean B,同时 Bean B 也依赖 Bean A(A↔B)。链条循环: Bean A → Bean…...
Caliper 负载(Workload)详细解析
Caliper 负载(Workload)详细解析 负载(Workload)是 Caliper 性能测试的核心部分,它定义了测试期间要执行的具体合约调用行为和交易模式。下面我将全面深入地讲解负载的各个方面。 一、负载模块基本结构 一个典型的负载模块(如 workload.js)包含以下基本结构: use strict;/…...
脑机新手指南(七):OpenBCI_GUI:从环境搭建到数据可视化(上)
一、OpenBCI_GUI 项目概述 (一)项目背景与目标 OpenBCI 是一个开源的脑电信号采集硬件平台,其配套的 OpenBCI_GUI 则是专为该硬件设计的图形化界面工具。对于研究人员、开发者和学生而言,首次接触 OpenBCI 设备时,往…...
Web后端基础(基础知识)
BS架构:Browser/Server,浏览器/服务器架构模式。客户端只需要浏览器,应用程序的逻辑和数据都存储在服务端。 优点:维护方便缺点:体验一般 CS架构:Client/Server,客户端/服务器架构模式。需要单独…...
前端中slice和splic的区别
1. slice slice 用于从数组中提取一部分元素,返回一个新的数组。 特点: 不修改原数组:slice 不会改变原数组,而是返回一个新的数组。提取数组的部分:slice 会根据指定的开始索引和结束索引提取数组的一部分。不包含…...
