当前位置：首页 > article >正文

MLLMS_KNOW尝鲜版

article 2026/2/7 21:39:02

背景（个人流水账，可毫不犹豫跳过）

最近项目中有涉及到小物体检测的内容，昨天晚上讨论的时候有提出是否可以将关注区域放大的idea，不过后来没有就着这个东西深入，结果好巧不巧地，今天关注到这篇文章，这谁能忍住不验证下，哈哈。
当然，更重要的原因是，之前一个clip的坑至今还没有填，该文中有使用升级版blip，赶紧填上。不得不感叹技术日新月异，当时用clip的时候还觉得自己老时髦了，看看现在，已然变成时代的眼泪了。

install

老生常谈，但还是适当记录一下

按照readme里面的步骤依葫芦画瓢：

# Create and activate conda environment
conda create -n mllms_know python=3.10
conda activate mllms_know# Install dependencies
pip install -r requirements.txt# Install modified transformers library
cd transformers
pip install -e .
cd ..

不负所望，出问题了，报错如下：

ImportError: cannot import name 'Tensor' from 'torch' (unknown location)

在网上搜了下，大致定位到是torch版本不匹配的问题，降低了一下版本：

Installing collected packages: triton, torch, torchvision, torchaudioAttempting uninstall: tritonFound existing installation: triton 3.2.0Uninstalling triton-3.2.0:Successfully uninstalled triton-3.2.0Attempting uninstall: torchFound existing installation: torch 2.6.0Uninstalling torch-2.6.0:Successfully uninstalled torch-2.6.0Attempting uninstall: torchvisionFound existing installation: torchvision 0.21.0Uninstalling torchvision-0.21.0:Successfully uninstalled torchvision-0.21.0
Successfully installed torch-2.5.1 torchaudio-2.5.1 torchvision-0.20.1 triton-3.1.0

查看torch版本：

import torch# 查看 PyTorch 版本
print("PyTorch version:", torch.__version__)# 查看 CUDA 是否可用
print("CUDA available:", torch.cuda.is_available())

复现Benchmark Evaluation部分

textvqa数据集准备

下载然后按照步骤处理即可，需要提一句的是这个数据集还不小6个G多

7z使用

我这边遇到一个个性化问题，需要将数据从一个主机搬运到另一个上面，每次搬运不能超过4G，不太明白这个设置是为了什么，哈哈。
步骤：利用7z切片->7z x 第一个分片（合并回去）-> 7z解压

Running Evaluations

网络不好的话，需要提前下载模型：

huggingface-cli download --resume-download --local-dir-use-symlinks False llava-hf/llava-1.5-7b-hf --local-dir ./llava/

4070上推不动

CUDA_VISIBLE_DEVICES=0 python run.py --chunk_id 0 --total_chunks 1 --model llava --task textvqa --method rel_att                                                                                      ─╯llava-hf/llava-1.5-7b-hf
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.62s/it]
Traceback (most recent call last):File "/home/stardust/Documents/a_20250113_toilet_clean/small_object/mllms_know/run.py", line 248, in <module>main(args)File "/home/stardust/Documents/a_20250113_toilet_clean/small_object/mllms_know/run.py", line 175, in mainmodel = LlavaForConditionalGeneration.from_pretrained(args.model_id, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, attn_implementation="eager").to(args.device)File "/home/stardust/Documents/a_20250113_toilet_clean/small_object/mllms_know/transformers/src/transformers/modeling_utils.py", line 3096, in toreturn super().to(*args, **kwargs)File "/home/stardust/.aip_conda/mllms_know/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1340, in toreturn self._apply(convert)File "/home/stardust/.aip_conda/mllms_know/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _applymodule._apply(fn)File "/home/stardust/.aip_conda/mllms_know/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _applymodule._apply(fn)File "/home/stardust/.aip_conda/mllms_know/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _applymodule._apply(fn)[Previous line repeated 3 more times]File "/home/stardust/.aip_conda/mllms_know/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in _applyparam_applied = fn(param)File "/home/stardust/.aip_conda/mllms_know/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1326, in convertreturn t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 11.71 GiB of which 90.44 MiB is free. Including non-PyTorch memory, this process has 10.21 GiB memory in use. Of the allocated memory 10.02 GiB is allocated by PyTorch, and 16.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

有没有什么招哦？有没有大佬现身说一下法？

转战服务器

也是有点子慢，不过至少是运行起来了。

jupyter 使用问题

设置IP port
切换conda环境
加载模型部分卡住，原因还没有查到。

程序运行及结果分析

待更