Langchain 集成 FAISS
Langchain 集成 FAISS
- 1. FAISS
- 2. Similarity Search with score
- 3. Saving and loading
- 4. Merging
- 5. Similarity Search with filtering
1. FAISS
Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集,甚至可能无法容纳在 RAM 中的向量集。它还包含用于评估和参数调整的支持代码。
Faiss 文档地址在这里.
本笔记本展示了如何使用与 FAISS 矢量数据库相关的功能。
示例代码,
# !pip install faiss
# OR
# !pip install faiss-cpu
import os
import getpassos.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")# 如果需要在没有 AVX2 优化的情况下初始化 FAISS,请取消注释以下一行
# os.environ['FAISS_NO_AVX2'] = '1'
# from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
输出结果,
from langchain.document_loaders import TextLoaderloader = TextLoader("./state_of_the_union_en.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)# embeddings = OpenAIEmbeddings
embeddings = CohereEmbeddings()
示例代码,
db = FAISS.from_documents(docs, embeddings)query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
输出结果,
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
2. Similarity Search with score
有一些 FAISS 特定方法。其中之一是 similarity_search_with_score ,它不仅允许您返回文档,还允许返回查询到它们的距离分数。返回的距离分数是L2距离。因此,分数越低越好。
示例代码,
docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores[0]
输出结果,
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),7172.888)
refer: https://python.langchain.com/docs/integrations/vectorstores/faiss 文档的分数是 0.36913747,
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),0.36913747)
还可以使用 similarity_search_by_vector 搜索与给定嵌入向量类似的文档,它接受嵌入向量作为参数而不是字符串。
示例代码,
embedding_vector = embeddings.embed_query(query)
docs_and_scores = db.similarity_search_by_vector(embedding_vector)
docs_and_scores
输出结果如下,
[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together. \n\nFirst, beat the opioid epidemic.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. \n\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \n\nThat ends on my watch. \n\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \n\nWe’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \n\nLet’s pass the Paycheck Fairness Act and paid leave. \n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.', metadata={'source': './state_of_the_union_en.txt'})]
3. Saving and loading
您还可以保存和加载 FAISS 索引。这很有用,因此您不必每次使用它时都重新创建它。
示例代码,
db.save_local("faiss_index")
new_db = FAISS.load_local("faiss_index", embeddings)
docs = new_db.similarity_search(query)
docs[0]
输出结果,
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'})
4. Merging
您还可以合并两个 FAISS 矢量存储。
示例代码,
db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db2.docstore._dict
输出结果,
{'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}
示例代码,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db1.merge_from(db2)
输出结果,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={}),'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}
5. Similarity Search with filtering
FAISS vectorstore 还可以支持过滤,因为 FAISS 本身不支持过滤,我们必须手动执行。这是通过首先获取比 k 更多的结果然后过滤它们来完成的。您可以根据元数据过滤文档。您还可以在调用任何搜索方法时设置 fetch_k 参数,以设置在过滤之前要获取的文档数量。这是一个小例子:
示例代码,
from langchain.schema import Documentlist_of_documents = [Document(page_content="foo", metadata=dict(page=1)),Document(page_content="bar", metadata=dict(page=1)),Document(page_content="foo", metadata=dict(page=2)),Document(page_content="barbar", metadata=dict(page=2)),Document(page_content="foo", metadata=dict(page=3)),Document(page_content="bar burr", metadata=dict(page=3)),Document(page_content="foo", metadata=dict(page=4)),Document(page_content="bar bruh", metadata=dict(page=4)),
]
db = FAISS.from_documents(list_of_documents, embeddings)
results_with_scores = db.similarity_search_with_score("foo")
for doc, score in results_with_scores:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
输出结果,
Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 2}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 3}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 4}, Score: 0.018019594252109528
现在我们进行相同的查询调用,但我们仅过滤 page = 1,
results_with_scores = db.similarity_search_with_score("foo", filter=dict(page=1))
for doc, score in results_with_scores:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
输出结果,
Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: bar, Metadata: {'page': 1}, Score: 10266.8544921875
同样的事情也可以用 max_marginal_relevance_search 来完成。
示例代码,
results = db.max_marginal_relevance_search("foo", filter=dict(page=1))
for doc in results:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
输出结果,
Content: foo, Metadata: {'page': 1}
Content: bar, Metadata: {'page': 1}
以下是调用 similarity_search 时如何设置 fetch_k 参数的示例。通常您需要 fetch_k 参数 >> k 参数。这是因为 fetch_k 参数是过滤之前将获取的文档数。如果将 fetch_k 设置为较小的数字,您可能无法获得足够的文档进行过滤。
示例代码,
results = db.similarity_search("foo", filter=dict(page=1), k=1, fetch_k=4)
for doc in results:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
输出结果,
Content: foo, Metadata: {'page': 1}
完结!
相关文章:
Langchain 集成 FAISS
Langchain 集成 FAISS 1. FAISS2. Similarity Search with score3. Saving and loading4. Merging5. Similarity Search with filtering 1. FAISS Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集&a…...
科技与人元宇宙论坛跨界对话
近来,“元宇宙”成为热门话题,越来越频繁地出现在人们的视野里。大家都在谈论它,但似 乎还没有一个被所有人认同的定义。元宇宙究竟是什么?未来它会对我们的工作和生活带来什么样 的改变?当谈论虚拟现实(VR…...
JAVA-生成二维码图片
使用hutool工具包,主动一个简单方便,pom添加依赖 <dependency><groupId>cn.hutool</groupId><artifactId>hutool-all</artifactId><version>5.8.12</version> </dependency> 直接上代码 //设置像素宽高 QrConfig config new…...
【iOS】iOS持久化
文章目录 一. 数据持久化的目的二. iOS中数据持久化方案三. 数据持有化方式的分类1. 内存缓存2. 磁盘缓存SDWebImage缓存 四. 沙盒机制的介绍五. 沙盒目录结构1. 获取应用程序的沙盒路径2. 访问沙盒目录常用C函数介绍3. 沙盒目录介绍 六. 持久化数据存储方式1. XML属性列表2. P…...
基于Javaweb+Vue3实现淘宝卖鞋前后端分离项目
前端技术栈:HTMLCSSJavaScriptVue3 后端技术栈:JavaSEMySQLJDBCJavaWeb 文章目录 前言1️⃣登录功能登录后端登录前端 2️⃣商家管理查询商家查询商家后端查询商家前端 增加商家增加商家后端增加商家前端 删除商家删除商家后端删除商家前端 修改商家修改…...
bat一键批量、有序启动jar
将脚本文件后缀改为 bat,脚本文件和 jar 包放在同一个目录 echo offstart cmd /c "java -jar register.jar " ping 192.0.2.2 -n 1 -w 10000 > nulstart cmd /c "java -jar admin.jar " ping 192.0.2.2 -n 1 -w 30000 > nulstart cmd /c…...
centos7安装mysql数据库详细教程及常见问题解决
mysql数据库详细安装步骤 1.在root身份下输入执行命令: yum -y update 2.检查是否已经安装MySQL,输入以下命令并执行: mysql -v 如出现-bash: mysql: command not found 则说明没有安装mysql 也可以输入rpm -qa | grep -i mysql 查看是否已…...
C++ STL sort函数的底层实现
C STL sort函数的底层实现 sort函数的底层用到的是内省式排序以及插入排序,内省排序首先从快速排序开始,当递归深度超过一定深度(深度为排序元素数量的对数值)后转为堆排序。 先来回顾一下以上提到的3中排序方法: 快…...
ICP算法和优化问题详细公式推导
1. 介绍 ICP(Iterative Closest Point):求一组平移和旋转使得两个点云之间重合度尽可能高。 2. 算法流程 找最近邻关联点,求解 R , t R , t R , t R , t R,tR,tR,tR,t R,tR,tR,tR,t,如此反复直到重合程度足够高。 3. 数学描述 X { x 1 ,…...
【安全狗】linux免费服务器防护软件安全狗详细安装教程
在费用有限的基础上,复杂密码云服务器基础防护常见端口替换安全软件,可以防护绝大多数攻击 第一步:下载服务器安全狗Linux版(下文以64位版本为例) 官方提供了两个下载方式,本文采用的是 方式2 wget安装 方…...
【iOS】自定义字体
文章目录 前言一、下载字体二、添加字体三、检查字体四、使用字体 前言 在设计App的过程中我们常常会想办法去让我们的界面变得美观,使用好看的字体是我们美化界面的一个方法。接下来笔者将会讲解App中添加自定义字体 一、下载字体 我们要使用自定义字体&#x…...
WPF实战学习笔记06-设置待办事项界面
设置待办事项界面 创建待办待办事项集合并初始化 TodoViewModel: using Mytodo.Common.Models; using Prism.Commands; using Prism.Mvvm; using System; using System.Collections.Generic; using System.Collections.ObjectModel; using System.Linq; using Sy…...
推荐几个不错的免费配色工具网站
1. Paletton专业的配色套件,提供色轮理论及调色功能。可查看配色预览效果。 网站:http://paletton.com 2. Colormind一个基于机器学习的智能配色工具。可以一键生成配色方案。 网站:http://colormind.io 3. Adobe ColorAdobe官方的配色工具,可以从图片中取色,也可以随机生成配色…...
gitee page发布的静态网站,无法播放目录中的mp4视频
起因是希望在gitee上部署静态网站,利用three.js VideoTexture 环境贴图播放视频。 但是试了多几次 mp4均提示404,资源无法获取; 找了很多方案,最后发现将视频转为ogv 就可以完美适配了; mp4转ogv 附threejs使用ogv进…...
opencv-26 图像几何变换04- 重映射-函数 cv2.remap()
什么是重映射? 重映射(Remapping)是图像处理中的一种操作,用于将图像中的像素从一个位置映射到另一个位置。重映射可以实现图像的平移、旋转、缩放和透视变换等效果。它是一种基于像素级的图像变换技术,可以通过定义映…...
SkyWalking链路追踪中span全解
基本概念 在SkyWalking链路追踪中,Span(跨度)是Trace(追踪)的组成部分之一。Span代表一次调用或操作的单个组件,可以是一个方法调用、一个HTTP请求或者其他类型的操作。 每个Span都包含了一些关键的信息&am…...
【前端知识】React 基础巩固(三十一)——Redux的简介
React 基础巩固(三十一)——Redux 一、Redux是个纯函数 概念 纯函数(确定的输入一定产生确定的输出,函数在执行过程中不产生副作用): 在程序设计中,若一个函数符合以下条件,那么这个函数就被称为纯函数…...
拦截Bean使用之前各个时机的Spring组件
拦截Bean使用之前各个时机的Spring组件 之前使用过的BeanPostProcessor就是在Bean实例化之后,注入属性值之前的时机。 Spring Bean的生命周期本次演示的是在Bean实例化之前的时机,使用BeanFactoryPostProcessor进行验证,以及在加载Bean之前进…...
RT thread 之 Nand flash 读写过程分析
文章目录 前言:什么是Nand Flash?1、Nand Flash 读取步骤2、从主存读到Cache2.1 在标准spi接口下读取过程2.2 测试时序(SPI频率30MHz) 3.从Cache读取数据3.1在标准spi接口读取过程测试时序 前言:什么是Nand Flash&…...
独立站最全出单营销指南,新手卖家赶紧学起来吧!
这是一个需要投入大量时间和精力的挑战,但只有经过筛选在众多品牌和渠道中找到最适合自己的营销策略,才能成功。 新手商家经常会发现自己有很多可以改进的地方:品牌的颜色、字体以及其他一些细节。但真正走向成熟的商家会意识到,…...
基于距离变化能量开销动态调整的WSN低功耗拓扑控制开销算法matlab仿真
目录 1.程序功能描述 2.测试软件版本以及运行结果展示 3.核心程序 4.算法仿真参数 5.算法理论概述 6.参考文献 7.完整程序 1.程序功能描述 通过动态调整节点通信的能量开销,平衡网络负载,延长WSN生命周期。具体通过建立基于距离的能量消耗模型&am…...
8k长序列建模,蛋白质语言模型Prot42仅利用目标蛋白序列即可生成高亲和力结合剂
蛋白质结合剂(如抗体、抑制肽)在疾病诊断、成像分析及靶向药物递送等关键场景中发挥着不可替代的作用。传统上,高特异性蛋白质结合剂的开发高度依赖噬菌体展示、定向进化等实验技术,但这类方法普遍面临资源消耗巨大、研发周期冗长…...
django filter 统计数量 按属性去重
在Django中,如果你想要根据某个属性对查询集进行去重并统计数量,你可以使用values()方法配合annotate()方法来实现。这里有两种常见的方法来完成这个需求: 方法1:使用annotate()和Count 假设你有一个模型Item,并且你想…...
STM32标准库-DMA直接存储器存取
文章目录 一、DMA1.1简介1.2存储器映像1.3DMA框图1.4DMA基本结构1.5DMA请求1.6数据宽度与对齐1.7数据转运DMA1.8ADC扫描模式DMA 二、数据转运DMA2.1接线图2.2代码2.3相关API 一、DMA 1.1简介 DMA(Direct Memory Access)直接存储器存取 DMA可以提供外设…...
如何为服务器生成TLS证书
TLS(Transport Layer Security)证书是确保网络通信安全的重要手段,它通过加密技术保护传输的数据不被窃听和篡改。在服务器上配置TLS证书,可以使用户通过HTTPS协议安全地访问您的网站。本文将详细介绍如何在服务器上生成一个TLS证…...
Cloudflare 从 Nginx 到 Pingora:性能、效率与安全的全面升级
在互联网的快速发展中,高性能、高效率和高安全性的网络服务成为了各大互联网基础设施提供商的核心追求。Cloudflare 作为全球领先的互联网安全和基础设施公司,近期做出了一个重大技术决策:弃用长期使用的 Nginx,转而采用其内部开发…...
2025盘古石杯决赛【手机取证】
前言 第三届盘古石杯国际电子数据取证大赛决赛 最后一题没有解出来,实在找不到,希望有大佬教一下我。 还有就会议时间,我感觉不是图片时间,因为在电脑看到是其他时间用老会议系统开的会。 手机取证 1、分析鸿蒙手机检材&#x…...
Unsafe Fileupload篇补充-木马的详细教程与木马分享(中国蚁剑方式)
在之前的皮卡丘靶场第九期Unsafe Fileupload篇中我们学习了木马的原理并且学了一个简单的木马文件 本期内容是为了更好的为大家解释木马(服务器方面的)的原理,连接,以及各种木马及连接工具的分享 文件木马:https://w…...
MFC 抛体运动模拟:常见问题解决与界面美化
在 MFC 中开发抛体运动模拟程序时,我们常遇到 轨迹残留、无效刷新、视觉单调、物理逻辑瑕疵 等问题。本文将针对这些痛点,详细解析原因并提供解决方案,同时兼顾界面美化,让模拟效果更专业、更高效。 问题一:历史轨迹与小球残影残留 现象 小球运动后,历史位置的 “残影”…...
腾讯云V3签名
想要接入腾讯云的Api,必然先按其文档计算出所要求的签名。 之前也调用过腾讯云的接口,但总是卡在签名这一步,最后放弃选择SDK,这次终于自己代码实现。 可能腾讯云翻新了接口文档,现在阅读起来,清晰了很多&…...
