当前位置：首页 > news >正文

问题解决：Problem exceeding maximum token in azure openai (with java)

news 2026/2/10 16:43:19

问题背景：

I'm doing a chat that returns queries based on the question you ask it in reference to a specific database. For this I use azure openai and Java in Spring Boot.

我正在开发一个聊天功能，该功能根据您针对特定数据库的提问返回查询结果。为此，我使用了Azure OpenAI和Spring Boot中的Java。

My problem comes here:

How can I make the AI remember the previous questions without passing the context back to it (what I want to do is greatly reduce the consumption of tokens, since depending on what it asks, if the question contains a keyword, for example 'users', what I do is pass in the context the information in this table that is huge (name of the fields, type of data and description) so when you have several questions the use of tokens rises to more than 10,000))

我如何能让AI记住之前的问题，而不需要将上下文再次传递给它（我想做的是大大减少令牌的消耗，因为根据AI提出的问题，如果问题中包含一个关键字，例如“用户”，我会在上下文中传递这个巨大表格的信息（字段名、数据类型和描述），所以当你有多个问题时，令牌的使用量会上升到超过10,000个）)

I can't show all the code since it's a project for my company.

由于这是我们公司的一个项目，我不能展示所有的代码。

What im currently doing is adding to the context the referenced table and the principal context(you are a based SQL chat...). And for the chat to remember, I have tried to save the history in java and pass the context history again(but this exceed the tokens pretty fast)

我目前所做的是向上下文中添加引用的表格和主要上下文（例如“您是一个基于SQL的聊天...”）。为了让聊天能够记住之前的对话，我试图在Java中保存历史记录并再次传递上下文历史（但这很快就会超过令牌限制）。

This is what I'm currently doing (no remembering from the AI):

这是我现在的做法（AI不会记住之前的对话）

chatMessages.add(new ChatMessage(ChatRole.SYSTEM, context));chatMessages.add(new ChatMessage(ChatRole.USER, question));ChatCompletions chatCompletions = client.getChatCompletions(deploymentOrModelId, new ChatCompletionsOptions(chatMessages));

问题解决：

As far as I know, there is no way to make the LLM (Azure OpenAI in this case) remember your context cheaply, as you said, sending context (and a huge chunk of it) on each call gets pricy really fast. That been said, you could change the approach and try other techniques to mimic that the AI has memory like summarizing the previous questions and send that as content (instead of a long string with 20 questions/answers, you send a short summary of what the user has been asking for. it will keep your prompt short and kind of "aware" of the conversation.

据我所知，确实没有便宜的方法让大型语言模型（在这种情况下是Azure OpenAI）记住上下文，正如您所说，每次调用时发送上下文（特别是大量的上下文）会很快变得昂贵。话虽如此，您可以改变方法并尝试其他技术来模拟AI具有记忆的功能，比如总结之前的问题并将其作为内容发送（而不是发送包含20个问答的长字符串，您发送一个用户一直在询问的内容的简短摘要）。这将使您的提示保持简短，并使AI对对话保持“意识”。

There are also conversation buffers (keeping the chat history in memory and send it to de llm each time as you did) but it gets long pretty fast, for that you could configure a buffer window (limiting the memory of the conversation to the last 3 questions for example, that should help keep the token count manageable).

还有对话缓冲区（将聊天历史保存在内存中，并在每次调用时像您之前所做的那样发送给LLM），但对话历史很快就会变得很长。为此，您可以配置一个缓冲区窗口（例如，将对话的内存限制为最后3个问题），这有助于将令牌数量控制在可管理的范围内。

There are several ways to manage this but there is no "perfect memory" as far as I know, not one the is worth paying. If you could tell us a bit more on how good the bot memory needs to be or the specific use case, maybe we can be more precise. Good luck!

管理这种情况有几种方法，但据我所知，没有“完美的记忆”，至少没有一种值得为此付费的。如果您能告诉我们机器人需要多好的记忆能力，或者具体的使用场景，我们可能能给出更精确的建议。祝您好运！

问题解决：Problem exceeding maximum token in azure openai (with java)

问题背景：

My problem comes here:

This is what I'm currently doing (no remembering from the AI):

问题解决：

相关文章：

问题解决：Problem exceeding maximum token in azure openai (with java)

eNSP学习——OSPF在帧中继网络中的配置

PHP转Go系列 | 条件循环的使用姿势

八大经典排序算法

【LeetCode热题 100】三数之和

【深度学习驱动流体力学】完整配置安装 OpenFOAM 及其所需的ThirdParty与QT5工具

YOLOv10改进 | Neck | 添加双向特征金字塔BiFPN【含二次独家创新】

PostgreSQL源码分析——pg_basebackup

QT基础 - 常见图表绘制

解释React中的“端口（Portals）”是什么，以及如何使用它来渲染子节点到DOM树以外的部分。

java实现分类下拉树，点击时对应搜索---后端逻辑

【2024最新华为OD-C/D卷试题汇总】[支持在线评测] 披萨大作战(100分) - 三语言AC题解(Python/Java/Cpp)

探索Facebook对世界各地文化的影响

导出requirements.txt

我主编的电子技术实验手册（09）——并联电路

数据结构_二叉树

Java线程池七个参数详解

产品Web3D交互展示有什么优势？如何快速制作？

Python | Leetcode Python题解之第171题Excel列表序号

【银河麒麟】高可用触发服务器异常重启，处理机制详解

深度学习在微纳光子学中的应用

【杂谈】-递归进化：人工智能的自我改进与监管挑战

css实现圆环展示百分比，根据值动态展示所占比例

Auto-Coder使用GPT-4o完成：在用TabPFN这个模型构建一个预测未来3天涨跌的分类任务

微信小程序云开发平台MySQL的连接方式

AGain DB和倍数增益的关系

Go 并发编程基础：通道（Channel）的使用

比较数据迁移后MySQL数据库和OceanBase数据仓库中的表

【p2p、分布式，区块链笔记 MESH】Bluetooth蓝牙通信 BLE Mesh协议的拓扑结构定向转发机制

【前端异常】JavaScript错误处理：分析 Uncaught (in promise) error