当前位置：首页 > news >正文

自然语言处理从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：缓存LLM的调用结果]

news 2025/7/2 21:00:27

分类目录：《自然语言处理从入门到应用》总目录

from langchain.llms import OpenAI

在内存中缓存

import langchain
from langchain.cache import InMemoryCachelangchain.llm_cache = InMemoryCache()# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms Wall time: 4.83 s

输出：

"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 238 µs, sys: 143 µs, total: 381 µs Wall time: 1.76 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLite 缓存

!rm .langchain.db

# 我们可以用 SQLite 缓存做同样的事情
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms Wall time: 825 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms Wall time: 2.67 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Redis缓存

我们还可以使用Redis缓存提示信息和做同样的事情：

# （确保您的本地 Redis 实例在运行此示例之前先运行）
from redis import Redis
from langchain.cache import RedisCachelangchain.llm_cache = RedisCache(redis_=Redis())

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms Wall time: 1.04 s

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms Wall time: 5.58 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

Semantic语义缓存

我们还使用Redis缓存提示和响应，并根据语义相似性评估命中率：

from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCachelangchain.llm_cache = RedisSemanticCache(redis_url="redis://localhost:6379",embedding=OpenAIEmbeddings()
)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 351 ms, sys: 156 ms, total: 507 ms Wall time: 3.37 s

输出：

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

计算第二次执行时间：

%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!
llm("Tell me one joke")

日志输出：

CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms Wall time: 262 ms

输出：

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

GPTCache

我们可以使用GPTCache进行精确匹配缓存或基于语义相似性缓存结果，我们先举一个精确匹配的例子：

from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlibdef get_hashed_name(name):return hashlib.sha256(name.encode()).hexdigest()def init_gptcache(cache_obj: Cache, llm: str):hashed_llm = get_hashed_name(llm)cache_obj.init(pre_embedding_func=get_prompt,data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),)langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms Wall time: 6.2 s

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

CPU times: user 571 µs, sys: 43 µs, total: 614 µs Wall time: 635 µs

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

现在让我们举一个相似度缓存的例子。

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlibdef get_hashed_name(name):return hashlib.sha256(name.encode()).hexdigest()def init_gptcache(cache_obj: Cache, llm: str):hashed_llm = get_hashed_name(llm)init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间：

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s Wall time: 8.44 s

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间：

%%time
# 这是一个完全匹配，所以它在缓存中找到它
llm("Tell me a joke")

日志输出：

CPU times: user 866 ms, sys: 20 ms, total: 886 ms Wall time: 226 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第三次执行时间：

%%time
# 这不是完全匹配，但在语义上是在距离之内，所以它命中了！
llm("Tell me joke")

日志输出：

CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms Wall time: 224 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLAlchemy Cache

我们可以使用 SQLAlchemyCache来缓存SQLAlchemy支持的任何 SQL 数据库：

# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)

Custom SQLAlchemy Schemas

我们可以定义自己的声明性SQLAlchemyCache子类，以自定义用于缓存的模式。例如，为了支持在Postgres中进行高速全文提示索引，我们可以使用：

from sqlalchemy import Column, Integer, String, Computed, Index, Sequence
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType
from langchain.cache import SQLAlchemyCacheBase = declarative_base()class FulltextLLMCache(Base):  # type: ignore"""Postgres table for fulltext-indexed LLM Cache"""__tablename__ = "llm_cache_fulltext"id = Column(Integer, Sequence('cache_id'), primary_key=True)prompt = Column(String, nullable=False)llm = Column(String, nullable=False)idx = Column(Integer)response = Column(String)prompt_tsv = Column(TSVectorType(), Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True))__table_args__ = (Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"),)engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

可选缓存（Optional Caching）

我们也可以选择关闭特定LLM的缓存。在下面的示例中，即使启用了全局缓存，我们也将其关闭了一个特定的LLM：

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)

计算第一次执行时间：

%%time
llm("Tell me a joke")

日志输出：

CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms Wall time: 745 ms

输出：

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

%%time
llm("Tell me a joke")

日志输出：

CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms Wall time: 623 ms

输出：

'\n\nTwo guys stole a calendar. They got six months each.'

链式可选缓存（Optional Caching in Chains）

我们还可以关闭链中特定节点的缓存。需要注意的是，某些接口通常更容易先构建链，然后再编辑 LLM。作为示例，我们将加载一个汇总器map-reduce链。我们将缓存映射步骤的结果，但不会冻结合并步骤的结果：

llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChaintext_splitter = CharacterTextSplitter()
with open('../../../state_of_the_union.txt') as f:state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)

计算第一次执行时间：

%%time
chain.run(docs)

日志输出：

CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms Wall time: 5.09 s

输出：

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'

当我们再次运行它时，我们会发现它的运行速度大大加快，但最终的答案却不同。这是由于在映射步骤进行缓存，但在归约步骤没有进行缓存所致计算第二次执行时间：

%%time
chain.run(docs)

日志输出：

CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms Wall time: 1.04 s

输出：

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'

最后我们需要记得执行：

!rm .langchain.db sqlite.db

参考文献：
[1] LangChain 🦜️🔗 中文网，跟着LangChain一起学LLM/GPT开发：https://www.langchain.com.cn/
[2] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架：http://www.cnlangchain.com/

在内存中缓存

SQLite 缓存

Redis缓存

Semantic语义缓存

GPTCache

SQLAlchemy Cache

Custom SQLAlchemy Schemas

可选缓存（Optional Caching）

链式可选缓存（Optional Caching in Chains）

相关文章：