Langchain 集成 FAISS
Langchain 集成 FAISS
- 1. FAISS
- 2. Similarity Search with score
- 3. Saving and loading
- 4. Merging
- 5. Similarity Search with filtering
1. FAISS
Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集,甚至可能无法容纳在 RAM 中的向量集。它还包含用于评估和参数调整的支持代码。
Faiss 文档地址在这里.
本笔记本展示了如何使用与 FAISS
矢量数据库相关的功能。
示例代码,
# !pip install faiss
# OR
# !pip install faiss-cpu
import os
import getpassos.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")# 如果需要在没有 AVX2 优化的情况下初始化 FAISS,请取消注释以下一行
# os.environ['FAISS_NO_AVX2'] = '1'
# from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
输出结果,
from langchain.document_loaders import TextLoaderloader = TextLoader("./state_of_the_union_en.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)# embeddings = OpenAIEmbeddings
embeddings = CohereEmbeddings()
示例代码,
db = FAISS.from_documents(docs, embeddings)query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
输出结果,
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
2. Similarity Search with score
有一些 FAISS 特定方法。其中之一是 similarity_search_with_score
,它不仅允许您返回文档,还允许返回查询到它们的距离分数。返回的距离分数是L2距离。因此,分数越低越好。
示例代码,
docs_and_scores = db.similarity_search_with_score(query)
docs_and_scores[0]
输出结果,
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),7172.888)
refer: https://python.langchain.com/docs/integrations/vectorstores/faiss 文档的分数是 0.36913747
,
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),0.36913747)
还可以使用 similarity_search_by_vector
搜索与给定嵌入向量类似的文档,它接受嵌入向量作为参数而不是字符串。
示例代码,
embedding_vector = embeddings.embed_query(query)
docs_and_scores = db.similarity_search_by_vector(embedding_vector)
docs_and_scores
输出结果如下,
[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together. \n\nFirst, beat the opioid epidemic.', metadata={'source': './state_of_the_union_en.txt'}),Document(page_content='Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. \n\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \n\nThat ends on my watch. \n\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \n\nWe’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \n\nLet’s pass the Paycheck Fairness Act and paid leave. \n\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \n\nLet’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.', metadata={'source': './state_of_the_union_en.txt'})]
3. Saving and loading
您还可以保存和加载 FAISS 索引。这很有用,因此您不必每次使用它时都重新创建它。
示例代码,
db.save_local("faiss_index")
new_db = FAISS.load_local("faiss_index", embeddings)
docs = new_db.similarity_search(query)
docs[0]
输出结果,
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './state_of_the_union_en.txt'})
4. Merging
您还可以合并两个 FAISS 矢量存储。
示例代码,
db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db2.docstore._dict
输出结果,
{'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}
示例代码,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={})}
示例代码,
db1.merge_from(db2)
输出结果,
db1.docstore._dict
输出结果,
{'43f79c6d-6bb3-4a62-979d-58e011dcb086': Document(page_content='foo', metadata={}),'8dcb4556-8eb5-43be-9eaa-0bff9a6e7997': Document(page_content='bar', metadata={})}
5. Similarity Search with filtering
FAISS vectorstore 还可以支持过滤,因为 FAISS 本身不支持过滤,我们必须手动执行。这是通过首先获取比 k 更多的结果然后过滤它们来完成的。您可以根据元数据过滤文档。您还可以在调用任何搜索方法时设置 fetch_k 参数,以设置在过滤之前要获取的文档数量。这是一个小例子:
示例代码,
from langchain.schema import Documentlist_of_documents = [Document(page_content="foo", metadata=dict(page=1)),Document(page_content="bar", metadata=dict(page=1)),Document(page_content="foo", metadata=dict(page=2)),Document(page_content="barbar", metadata=dict(page=2)),Document(page_content="foo", metadata=dict(page=3)),Document(page_content="bar burr", metadata=dict(page=3)),Document(page_content="foo", metadata=dict(page=4)),Document(page_content="bar bruh", metadata=dict(page=4)),
]
db = FAISS.from_documents(list_of_documents, embeddings)
results_with_scores = db.similarity_search_with_score("foo")
for doc, score in results_with_scores:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
输出结果,
Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 2}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 3}, Score: 0.018019594252109528
Content: foo, Metadata: {'page': 4}, Score: 0.018019594252109528
现在我们进行相同的查询调用,但我们仅过滤 page = 1
,
results_with_scores = db.similarity_search_with_score("foo", filter=dict(page=1))
for doc, score in results_with_scores:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
输出结果,
Content: foo, Metadata: {'page': 1}, Score: 0.018019594252109528
Content: bar, Metadata: {'page': 1}, Score: 10266.8544921875
同样的事情也可以用 max_marginal_relevance_search
来完成。
示例代码,
results = db.max_marginal_relevance_search("foo", filter=dict(page=1))
for doc in results:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
输出结果,
Content: foo, Metadata: {'page': 1}
Content: bar, Metadata: {'page': 1}
以下是调用 similarity_search
时如何设置 fetch_k
参数的示例。通常您需要 fetch_k
参数 >> k
参数。这是因为 fetch_k
参数是过滤之前将获取的文档数。如果将 fetch_k
设置为较小的数字,您可能无法获得足够的文档进行过滤。
示例代码,
results = db.similarity_search("foo", filter=dict(page=1), k=1, fetch_k=4)
for doc in results:print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")
输出结果,
Content: foo, Metadata: {'page': 1}
完结!
相关文章:
Langchain 集成 FAISS
Langchain 集成 FAISS 1. FAISS2. Similarity Search with score3. Saving and loading4. Merging5. Similarity Search with filtering 1. FAISS Facebook AI Similarity Search (Faiss)是一个用于高效相似性搜索和密集向量聚类的库。它包含的算法可以搜索任意大小的向量集&a…...

科技与人元宇宙论坛跨界对话
近来,“元宇宙”成为热门话题,越来越频繁地出现在人们的视野里。大家都在谈论它,但似 乎还没有一个被所有人认同的定义。元宇宙究竟是什么?未来它会对我们的工作和生活带来什么样 的改变?当谈论虚拟现实(VR…...
JAVA-生成二维码图片
使用hutool工具包,主动一个简单方便,pom添加依赖 <dependency><groupId>cn.hutool</groupId><artifactId>hutool-all</artifactId><version>5.8.12</version> </dependency> 直接上代码 //设置像素宽高 QrConfig config new…...

【iOS】iOS持久化
文章目录 一. 数据持久化的目的二. iOS中数据持久化方案三. 数据持有化方式的分类1. 内存缓存2. 磁盘缓存SDWebImage缓存 四. 沙盒机制的介绍五. 沙盒目录结构1. 获取应用程序的沙盒路径2. 访问沙盒目录常用C函数介绍3. 沙盒目录介绍 六. 持久化数据存储方式1. XML属性列表2. P…...

基于Javaweb+Vue3实现淘宝卖鞋前后端分离项目
前端技术栈:HTMLCSSJavaScriptVue3 后端技术栈:JavaSEMySQLJDBCJavaWeb 文章目录 前言1️⃣登录功能登录后端登录前端 2️⃣商家管理查询商家查询商家后端查询商家前端 增加商家增加商家后端增加商家前端 删除商家删除商家后端删除商家前端 修改商家修改…...
bat一键批量、有序启动jar
将脚本文件后缀改为 bat,脚本文件和 jar 包放在同一个目录 echo offstart cmd /c "java -jar register.jar " ping 192.0.2.2 -n 1 -w 10000 > nulstart cmd /c "java -jar admin.jar " ping 192.0.2.2 -n 1 -w 30000 > nulstart cmd /c…...

centos7安装mysql数据库详细教程及常见问题解决
mysql数据库详细安装步骤 1.在root身份下输入执行命令: yum -y update 2.检查是否已经安装MySQL,输入以下命令并执行: mysql -v 如出现-bash: mysql: command not found 则说明没有安装mysql 也可以输入rpm -qa | grep -i mysql 查看是否已…...
C++ STL sort函数的底层实现
C STL sort函数的底层实现 sort函数的底层用到的是内省式排序以及插入排序,内省排序首先从快速排序开始,当递归深度超过一定深度(深度为排序元素数量的对数值)后转为堆排序。 先来回顾一下以上提到的3中排序方法: 快…...
ICP算法和优化问题详细公式推导
1. 介绍 ICP(Iterative Closest Point):求一组平移和旋转使得两个点云之间重合度尽可能高。 2. 算法流程 找最近邻关联点,求解 R , t R , t R , t R , t R,tR,tR,tR,t R,tR,tR,tR,t,如此反复直到重合程度足够高。 3. 数学描述 X { x 1 ,…...

【安全狗】linux免费服务器防护软件安全狗详细安装教程
在费用有限的基础上,复杂密码云服务器基础防护常见端口替换安全软件,可以防护绝大多数攻击 第一步:下载服务器安全狗Linux版(下文以64位版本为例) 官方提供了两个下载方式,本文采用的是 方式2 wget安装 方…...

【iOS】自定义字体
文章目录 前言一、下载字体二、添加字体三、检查字体四、使用字体 前言 在设计App的过程中我们常常会想办法去让我们的界面变得美观,使用好看的字体是我们美化界面的一个方法。接下来笔者将会讲解App中添加自定义字体 一、下载字体 我们要使用自定义字体&#x…...
WPF实战学习笔记06-设置待办事项界面
设置待办事项界面 创建待办待办事项集合并初始化 TodoViewModel: using Mytodo.Common.Models; using Prism.Commands; using Prism.Mvvm; using System; using System.Collections.Generic; using System.Collections.ObjectModel; using System.Linq; using Sy…...

推荐几个不错的免费配色工具网站
1. Paletton专业的配色套件,提供色轮理论及调色功能。可查看配色预览效果。 网站:http://paletton.com 2. Colormind一个基于机器学习的智能配色工具。可以一键生成配色方案。 网站:http://colormind.io 3. Adobe ColorAdobe官方的配色工具,可以从图片中取色,也可以随机生成配色…...
gitee page发布的静态网站,无法播放目录中的mp4视频
起因是希望在gitee上部署静态网站,利用three.js VideoTexture 环境贴图播放视频。 但是试了多几次 mp4均提示404,资源无法获取; 找了很多方案,最后发现将视频转为ogv 就可以完美适配了; mp4转ogv 附threejs使用ogv进…...

opencv-26 图像几何变换04- 重映射-函数 cv2.remap()
什么是重映射? 重映射(Remapping)是图像处理中的一种操作,用于将图像中的像素从一个位置映射到另一个位置。重映射可以实现图像的平移、旋转、缩放和透视变换等效果。它是一种基于像素级的图像变换技术,可以通过定义映…...

SkyWalking链路追踪中span全解
基本概念 在SkyWalking链路追踪中,Span(跨度)是Trace(追踪)的组成部分之一。Span代表一次调用或操作的单个组件,可以是一个方法调用、一个HTTP请求或者其他类型的操作。 每个Span都包含了一些关键的信息&am…...
【前端知识】React 基础巩固(三十一)——Redux的简介
React 基础巩固(三十一)——Redux 一、Redux是个纯函数 概念 纯函数(确定的输入一定产生确定的输出,函数在执行过程中不产生副作用): 在程序设计中,若一个函数符合以下条件,那么这个函数就被称为纯函数…...

拦截Bean使用之前各个时机的Spring组件
拦截Bean使用之前各个时机的Spring组件 之前使用过的BeanPostProcessor就是在Bean实例化之后,注入属性值之前的时机。 Spring Bean的生命周期本次演示的是在Bean实例化之前的时机,使用BeanFactoryPostProcessor进行验证,以及在加载Bean之前进…...

RT thread 之 Nand flash 读写过程分析
文章目录 前言:什么是Nand Flash?1、Nand Flash 读取步骤2、从主存读到Cache2.1 在标准spi接口下读取过程2.2 测试时序(SPI频率30MHz) 3.从Cache读取数据3.1在标准spi接口读取过程测试时序 前言:什么是Nand Flash&…...

独立站最全出单营销指南,新手卖家赶紧学起来吧!
这是一个需要投入大量时间和精力的挑战,但只有经过筛选在众多品牌和渠道中找到最适合自己的营销策略,才能成功。 新手商家经常会发现自己有很多可以改进的地方:品牌的颜色、字体以及其他一些细节。但真正走向成熟的商家会意识到,…...
设计模式和设计原则回顾
设计模式和设计原则回顾 23种设计模式是设计原则的完美体现,设计原则设计原则是设计模式的理论基石, 设计模式 在经典的设计模式分类中(如《设计模式:可复用面向对象软件的基础》一书中),总共有23种设计模式,分为三大类: 一、创建型模式(5种) 1. 单例模式(Sing…...

C++_核心编程_多态案例二-制作饮品
#include <iostream> #include <string> using namespace std;/*制作饮品的大致流程为:煮水 - 冲泡 - 倒入杯中 - 加入辅料 利用多态技术实现本案例,提供抽象制作饮品基类,提供子类制作咖啡和茶叶*//*基类*/ class AbstractDr…...
云计算——弹性云计算器(ECS)
弹性云服务器:ECS 概述 云计算重构了ICT系统,云计算平台厂商推出使得厂家能够主要关注应用管理而非平台管理的云平台,包含如下主要概念。 ECS(Elastic Cloud Server):即弹性云服务器,是云计算…...
React Native 导航系统实战(React Navigation)
导航系统实战(React Navigation) React Navigation 是 React Native 应用中最常用的导航库之一,它提供了多种导航模式,如堆栈导航(Stack Navigator)、标签导航(Tab Navigator)和抽屉…...
day52 ResNet18 CBAM
在深度学习的旅程中,我们不断探索如何提升模型的性能。今天,我将分享我在 ResNet18 模型中插入 CBAM(Convolutional Block Attention Module)模块,并采用分阶段微调策略的实践过程。通过这个过程,我不仅提升…...

解决Ubuntu22.04 VMware失败的问题 ubuntu入门之二十八
现象1 打开VMware失败 Ubuntu升级之后打开VMware上报需要安装vmmon和vmnet,点击确认后如下提示 最终上报fail 解决方法 内核升级导致,需要在新内核下重新下载编译安装 查看版本 $ vmware -v VMware Workstation 17.5.1 build-23298084$ lsb_release…...

新能源汽车智慧充电桩管理方案:新能源充电桩散热问题及消防安全监管方案
随着新能源汽车的快速普及,充电桩作为核心配套设施,其安全性与可靠性备受关注。然而,在高温、高负荷运行环境下,充电桩的散热问题与消防安全隐患日益凸显,成为制约行业发展的关键瓶颈。 如何通过智慧化管理手段优化散…...
【AI学习】三、AI算法中的向量
在人工智能(AI)算法中,向量(Vector)是一种将现实世界中的数据(如图像、文本、音频等)转化为计算机可处理的数值型特征表示的工具。它是连接人类认知(如语义、视觉特征)与…...
Rust 异步编程
Rust 异步编程 引言 Rust 是一种系统编程语言,以其高性能、安全性以及零成本抽象而著称。在多核处理器成为主流的今天,异步编程成为了一种提高应用性能、优化资源利用的有效手段。本文将深入探讨 Rust 异步编程的核心概念、常用库以及最佳实践。 异步编程基础 什么是异步…...
高防服务器能够抵御哪些网络攻击呢?
高防服务器作为一种有着高度防御能力的服务器,可以帮助网站应对分布式拒绝服务攻击,有效识别和清理一些恶意的网络流量,为用户提供安全且稳定的网络环境,那么,高防服务器一般都可以抵御哪些网络攻击呢?下面…...