当前位置：首页 > article >正文

【记录一下】LMDeploy学习笔记及遇到的问题

article 2026/2/28 7:35:08

LMDeploy 是一个用于大型语言模型（LLMs）和视觉-语言模型（VLMs）压缩、部署和服务的 Python 库。其核心推理引擎包括 TurboMind 引擎和 PyTorch 引擎。前者由 C++ 和 CUDA 开发，致力于推理性能的优化，而后者纯 Python 开发，旨在降低开发者的门槛。

LMDeploy 支持在 Linux 和 Windows 平台上部署 LLMs 和 VLMs，最低要求 CUDA 版本为 11.3。此外，它还与以下 NVIDIA GPU 兼容：

Volta(sm70): V100 Turing(sm75): 20 系列，T4 Ampere(sm80,sm86): 30 系列，A10, A16, A30, A100 Ada Lovelace(sm89): 40 系列

LMDeploy显存优化比vllm更好

nvitop  #查看显存占用

在一个干净的conda环境下（python3.8 - 3.12），安装 lmdeploy

一、安装

**linux环境目前不推荐使用3.12的版本**，但是windows环境不报错就很迷,但是windows环境安装的torch没有自带安装CUDA,因此启动时会报错，报错信息在下面

conda create -n lmdeploy python=3.12 -y
conda activate lmdeploy
pip install lmdeploy

二、报错1

因为在pip install lmdeploy时，下载Downloading fire-0.7.0.tar.gz报错，存在兼容性问题，这个版本的fire与12不兼容

(lmdeploy) root@dsw-942822-5c5dcbf687-85ktw:/mnt/workspace/Anaconda3/envs# pip install lmdeploy
Collecting lmdeployDownloading lmdeploy-0.7.2.post1-cp312-cp312-manylinux2014_x86_64.whl.metadata (17 kB)
Collecting accelerate>=0.29.3 (from lmdeploy)Downloading accelerate-1.5.2-py3-none-any.whl.metadata (19 kB)
Collecting einops (from lmdeploy)Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting fastapi (from lmdeploy)Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting fire (from lmdeploy)Downloading fire-0.7.0.tar.gz (87 kB)Preparing metadata (setup.py) ... errorerror: subprocess-exited-with-error× python setup.py egg_info did not run successfully.│ exit code: 1╰─> [3 lines of output]/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.12/site-packages/_distutils_hack/__init__.py:53: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.ymlwarnings.warn(ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed× Encountered error while generating package metadata.
╰─> See above for output.note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

推荐python3.8 - 3.11

conda create -n lmdeploy python=3.11 -y
conda activate lmdeploy
pip install lmdeploy

不在报错
在这里插入图片描述

三、启动

linux 下所下载的模型的绝对路径

lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct

四、报错2

启动过程中报错如下：

(lmdeploy) root@dsw-942822-5c5dcbf687-85ktw:/mnt/workspace/Anaconda3/envs# lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct
Traceback (most recent call last):File "/mnt/workspace/Anaconda3/envs/lmdeploy/bin/lmdeploy", line 8, in <module>sys.exit(run())^^^^^File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/entrypoint.py", line 14, in runSubCliServe.add_parsers()File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 361, in add_parsersSubCliServe.add_parser_api_server()File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 142, in add_parser_api_serverArgumentHelper.tool_call_parser(parser_group)File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/utils.py", line 375, in tool_call_parserfrom lmdeploy.serve.openai.tool_parser import ToolParserManagerFile "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/tool_parser/__init__.py", line 2, in <module>from .internlm2_parser import Internlm2ToolParserFile "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/tool_parser/internlm2_parser.py", line 6, in <module>import partial_json_parser
ModuleNotFoundError: No module named 'partial_json_parser'

原因：由于缺少 partial_json_parser 模块。这是 lmdeploy 的依赖项之一，但可能未自动安装。
您遇到的错误是由于缺少 partial_json_parser 模块。这是 lmdeploy 的依赖项之一，但可能未自动安装。以下是解决方案：

1. 安装缺失的依赖项

pip install partial-json-parser

2. 重新运行 `lmdeploy serve` 命令

lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct

再次启动不报错
在这里插入图片描述
openai没有安装的记得安装

pip install openai

五、代码测试（linux环境下）

#多轮对话
from openai import OpenAI#定义多轮对话方法
def run_chat_session():#初始化客户端client = OpenAI(base_url="http://localhost:23333/v1/",api_key="123456")#初始化对话历史chat_history = []#启动对话循环while True:#获取用户输入user_input = input("用户：")if user_input.lower() == "exit":print("退出对话。")break#更新对话历史(添加用户输入)chat_history.append({"role":"user","content":user_input})#调用模型回答try:chat_complition = client.chat.completions.create(messages=chat_history,model="/mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct")#获取最新回答model_response = chat_complition.choices[0]print("AI:",model_response.message.content)#更新对话历史（添加AI模型的回复）chat_history.append({"role":"assistant","content":model_response.message.content})except Exception as e:print("发生错误：",e)break
if __name__ == '__main__':run_chat_session()

六、windows环境安装的torch没有自带安装CUDA,因此启动时会报错，报错信息在下面

(lmdeploy) PS C:\Users\fengxinzi> lmdeploy serve api_server "D:\Program Files\python\PycharmProjects\AiStudyProject\demo06\models\Qwen\Qwen2___5-0___5B-Instruct"
Traceback (most recent call last):File "<frozen runpy>", line 198, in _run_module_as_mainFile "<frozen runpy>", line 88, in _run_codeFile "D:\envs\lmdeploy\Scripts\lmdeploy.exe\__main__.py", line 7, in <module>File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\cli\entrypoint.py", line 39, in runargs.run(args)File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\cli\serve.py", line 283, in api_serverelse get_max_batch_size(args.device)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\utils.py", line 338, in get_max_batch_sizedevice_name = torch.cuda.get_device_name(0).lower()^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 493, in get_device_namereturn get_device_properties(device).name^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 523, in get_device_properties_lazy_init()  # will define _get_device_properties^^^^^^^^^^^^File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 310, in _lazy_initraise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

conda list 查出来也表明没有cuda
在这里插入图片描述

遇到的错误表明PyTorch未正确启用CUDA支持。
因此我们需要安装cuda,版本至少11.8

# 以下二选一
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

安装好以后，再次启动，没有报错
在这里插入图片描述
如此就可以通过代码连接，跑起来了。