当前位置：首页 > news >正文

qwen2 VL 多模态图文模型；图像、视频使用案例

news 2026/2/9 9:54:50

参考：
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

模型：

export HF_ENDPOINT=https://hf-mirror.comhuggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen2-VL-2B-Instruct  --local-dir qwen2-vl

安装：
transformers-4.45.0.dev0
accelerate-0.34.2 safetensors-0.4.5

pip install git+https://github.com/huggingface/transformers
pip install 'accelerate>=0.26.0'

代码：

单张图片

from PIL import Image
import requests
import torch
from torchvision import io
from typing import Dict
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained("/ai/qwen2-vl", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("/ai/qwen2-vl")# Image
url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
image = Image.open(requests.get(url, stream=True).raw)conversation = [{"role": "user","content": [{"type": "image",},{"type": "text", "text": "Describe this image."},],}
]# Preprocess the inputs
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
# Excepted output: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>\n<|im_start|>assistant\n'inputs = processor(text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)
inputs = inputs.to("cuda")# Inference: Generation of the output
output_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = [output_ids[len(input_ids) :]for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)

这是图片：
在这里插入图片描述

在这里插入图片描述

中文问


# Image
url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
image = Image.open(requests.get(url, stream=True).raw)conversation = [{"role": "user","content": [{"type": "image",},{"type": "text", "text": "描述下这张图片."},],}
]# Preprocess the inputs
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
# Excepted output: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>\n<|im_start|>assistant\n'inputs = processor(text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)
inputs = inputs.to("cuda")
# Inference: Generation of the output
output_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = [output_ids[len(input_ids) :]for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)

在这里插入图片描述

多张图片

def load_images(image_info):images = []for info in image_info:if "image" in info:if info["image"].startswith("http"):image = Image.open(requests.get(info["image"], stream=True).raw)else:image = Image.open(info["image"])images.append(image)return images# Messages containing multiple images and a text query
messages = [{"role": "user","content": [{"type": "image", "image": "/ai/fight.png"},{"type": "image", "image": "/ai/long.png"},{"type": "text", "text": "描述下这两张图片"},],}
]# Load images
image_info = messages[0]["content"][:2]  # Extract image info from the message
images = load_images(image_info)# Preprocess the inputs
text_prompt = processor.apply_chat_template(messages, add_generation_prompt=True)inputs = processor(text=[text_prompt], images=images, padding=True, return_tensors="pt"
)
inputs = inputs.to("cuda")# Inference: Generation of the output
output_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = [output_ids[len(input_ids) :]for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)

在这里插入图片描述

视频
安装

pip install qwen-vl-utils

from qwen_vl_utils import process_vision_info# Messages containing a images list as a video and a text query
messages = [{"role": "user","content": [{"type": "video","video": ["file:///path/to/frame1.jpg","file:///path/to/frame2.jpg","file:///path/to/frame3.jpg","file:///path/to/frame4.jpg",],"fps": 1.0,},{"type": "text", "text": "Describe this video."},],}
]
# Messages containing a video and a text query
messages = [{"role": "user","content": [{"type": "video","video": "/ai/血液从上肢流入上腔静脉.mp4","max_pixels": 360 * 420,"fps": 1.0,},{"type": "text", "text": "描述下这个视频"},],}
]# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text],images=image_inputs,videos=video_inputs,padding=True,return_tensors="pt",
)
inputs = inputs.to("cuda")# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

在这里插入图片描述

qwen2 VL 多模态图文模型；图像、视频使用案例

参考： https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct 模型： export HF_ENDPOINThttps://hf-mirror.comhuggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen2-VL-2B-Instruct --local-dir qwen2-vl安装&#x…...

编程日记 2024/9/14 16:46:49

ASPICE评估：汽车软件质量的守护神

随着汽车行业的快速发展，车载软件系统的复杂性和重要性日益凸显。为了确保汽车软件的质量和安全性， 汽车行业引入了ASPICE（Automotive SPICE）评估作为评价软件开发团队研发能力的重要工具。本文将详细介绍ASPICE评估的概念、过…...

编程日记 2024/9/14 16:44:46

野生动物检测系统源码分享

野生动物检测检测系统源码分享 [一条龙教学YOLOV8标注好的数据集一键训练_70全套改进创新点发刊_Web前端展示] 1.研究背景与意义项目参考AAAI Association for the Advancement of Artificial Intelligence 项目来源AACV Association for the Advancement of Computer Vis…...

编程日记 2024/9/14 16:41:43

【Hot100】LeetCode—75. 颜色分类

目录 1- 思路题目识别技巧 2- 实现⭐75. 颜色分类——题解思路 3- ACM 实现原题链接：75. 颜色分类 1- 思路题目识别识别1 ：给定三种类型数据，使得三种数据用一次遍历实现三种数据排序。技巧用两条线将数组分为三部分A 线左侧&#x…...

编程日记 2024/9/14 16:40:42

【物联网技术大作业】设计一个智能家居的应用场景

前言： 本人的物联网技术的期末大作业，希望对你有帮助。目录大作业设计题 （1）智能家居的概述。 （2）介绍智能家居应用。要求至少5个方面的应用，包括每个应用所采用的设备，性能&am…...

编程日记 2024/9/14 16:38:40

ESP8266做httpServer提示Header fields are too long for server to interpret

CONFIG_HTTP_BUF_SIZE512 CONFIG_HTTPD_MAX_REQ_HDR_LEN1024 CONFIG_HTTPD_MAX_URI_LEN512CONFIG_HTTPD_MAX_REQ_HDR_LEN由512改为1024...

编程日记 2024/9/14 16:35:36

jmeter设置全局token

1、创建setup线程，获取token的接口在所有线程中优先执行，确保后续线程可以拿到token 2、添加配置原件-Http信息头管理器，添加取样器-http请求配置好接口路径，端口，前端传参数据，调试一下，保证获…...

编程日记 2024/9/14 16:34:35

DORIS - DORIS之索引简介

索引概述索引对比索引建议 （1）最频繁使用的过滤条件指定为 Key字段，自动建前缀索引，它的过滤效果最好，但是一个表只能有一个前缀索引，因此要用在最频繁的过滤条件上，前缀索引比较小&#xff…...

编程日记 2024/9/14 16:33:34

Java 串口通信—收发，监听数据(代码实现)

一、串口通信与串行通信的原理串行通信是指仅用一根接收线和一根发送线，将数据以位进行依次传输的一种通讯方式，每一位数据占据一个固定的时间长度。串口通信（Serial Communications）的概念非常简单，串口按位&#x…...

编程日记 2024/9/14 16:31:33

fileinput pdf编辑初始化预览

var $fileLinkInput $(#file_link_full); $fileLinkInput.fileinput({language: zh,uploadUrl: <?php echo Yii::$app->urlManager->createUrl([file/image, type > work_file]);?>,initialPreview: [defaultFile],initialPreviewAsData: true,initialPrevie…...

编程日记 2024/9/14 16:30:32

微信支付开发-需求整理及需求设计

一、客户要求 1、通过唤醒机器人参与答题项，机器人自动获取题目，用户进行答题； 2、用户答对题数与后台设置的一样或者更多，则提醒用户可以领取奖品，但是需要用户支付邮费； 3、用户在几天之内不能重复领取奖…...

编程日记 2024/9/14 16:28:08

vs code: pnpm : 无法加载文件 C:\Program Files\nodejs\pnpm.ps1，因为在此系统上禁止运行脚本

在visual studio code运行pnpm出错： pnpm : 无法加载文件 C:\Program Files\nodejs\pnpm.ps1，因为在此系统上禁止运行脚本解决方案： 到C:\Program Files\nodejs文件夹下删除pnpm.ps1即可。 C:\Program Files\nodejs改成你自己的路径...

编程日记 2024/9/14 16:27:07

web测试必备技能：浏览器兼容性测试

如今，市面上的浏览器种类越来越多（尤其是在平板和移动设备上），这就意味着你所测试的站点需要在这些你声称支持浏览器上都能很好的工作。同时，主流浏览器（IE，Firefox，Chrome&#x…...

编程日记 2024/9/14 16:25:05

《数据资产管理核心技术与应用》首次大型赠书活动圆满结束

《数据资产管理核心技术与应用》是清华大学出版社出版的一本图书，作者为张永清等著，在2024.9.11号晚上20:00，本书作者张永清联合锋哥聊数仓公众号和清华大学出版社一起，向各大大数据技术爱好者通过三轮互动活动赠送了3本正版图书。…...

编程日记 2024/9/14 16:23:04

vue在一个组件引用其他组件

在vue一个组件中引用另一个组件的步骤必须在script中导入要引用的组件需要在export default的components引用导入的组件（这一步经常忘记）在template使用导入的组件<script> import Vue01 from "@/components/Vue01.vue";...

编程日记 2024/9/14 16:22:03

软件测试学习笔记丨Postman实战练习

本文转自测试人社区，原文链接：https://ceshiren.com/t/topic/32096#h-22 二、实战练习 2.1 宠物商店接口文档分析接口文档：http://petstore.swagger.io ，这是宠物商店接口的 swagger 文档。 2.1.1 什么是 swagger Swagger 是…...

编程日记 2024/9/14 16:19:00

kubernetes微服务基础及类型

目录 1 什么是微服务 2 微服务的类型 3 ipvs模式 ipvs模式配置方式 4 微服务类型详解 4.1 ClusterIP 4.2 ClusterIP中的特殊模式headless 4.3 nodeport 4.4 metalLB配合loadbalance实现发布IP 1 什么是微服务用控制器来完成集群的工作负载，那么应用如何暴漏出去&…...

编程日记 2024/9/14 16:17:59

linux-L3_linux 查看进程(node-red)

linux 查看进程以查看进程node-red为例 ps aux | grep node-red...

编程日记 2024/9/14 16:16:58

区块链之变：揭秘Web3对互联网的改变

传统游戏中，玩家的虚拟资产（如角色、装备）通常由游戏公司控制，玩家无法真正拥有这些资产或进行交易。而在区块链游戏中，虚拟资产通过去中心化技术记录在区块链上，玩家对其拥有完全的所有权，并能…...

编程日记 2024/9/14 16:14:56

SAP B1 Web Client MS Teams App集成连载一：先决条件/Prerequisites

一、先决条件/Prerequisites 在设置 SAP Business One 应用之前，确保您已具备以下各项：Before you set up the SAP Business One app, make sure you have acquired the following: 1.Microsoft Teams 管理员账户/A Microsoft Teams admin account 您需…...

编程日记 2024/9/14 16:12:54

【CSS position 属性】static、relative、fixed、absolute 、sticky详细介绍，多层嵌套定位示例

文章目录 ★ position 的五种类型及基本用法 ★ 一、position 属性概述二、position 的五种类型详解（初学者版） 1. static（默认值） 2. relative（相对定位） 3. absolute（绝对定位） 4. fixed（固定定位） 5. sticky（粘性定位）三、定位元素的层级关系（z-i…...

编程新知 2026/2/9 1:38:05

vue3+vite项目中使用.env文件环境变量方法

vue3vite项目中使用.env文件环境变量方法 .env文件作用命名规则常用的配置项示例使用方法注意事项在vite.config.js文件中读取环境变量方法 .env文件作用 .env 文件用于定义环境变量，这些变量可以在项目中通过 import.meta.env 进行访问。Vite 会自动加载这些环境变…...

编程新知 2026/1/26 19:12:01

3-11单元格区域边界定位(End属性)学习笔记

返回一个Range 对象，只读。该对象代表包含源区域的区域上端下端左端右端的最后一个单元格。等同于按键 End 向上键(End(xlUp))、End向下键(End(xlDown))、End向左键(End(xlToLeft)End向右键(End(xlToRight)) 注意：它移动的位置必须是相连的有内容的单元格…...

编程新知 2026/1/20 19:15:45

laravel8+vue3.0+element-plus搭建方法

创建 laravel8 项目 composer create-project --prefer-dist laravel/laravel laravel8 8.* 安装 laravel/ui composer require laravel/ui 修改 package.json 文件 "devDependencies": {"vue/compiler-sfc": "^3.0.7","axios": …...

编程新知 2025/10/15 0:25:46

【C++进阶篇】智能指针

C内存管理终极指南：智能指针从入门到源码剖析一. 智能指针1.1 auto_ptr1.2 unique_ptr1.3 shared_ptr1.4 make_shared 二. 原理三. shared_ptr循环引用问题三. 线程安全问题四. 内存泄漏4.1 什么是内存泄漏4.2 危害4.3 避免内存泄漏五. 最后一. 智能指针智能指…...

编程新知 2026/1/31 8:16:58

AI语音助手的Python实现

引言语音助手（如小爱同学、Siri）通过语音识别、自然语言处理（NLP）和语音合成技术，为用户提供直观、高效的交互体验。随着人工智能的普及，Python开发者可以利用开源库和AI模型，快速构建自定义语音助手。本文由浅入深，详细介绍如何使用Python开发AI语音助手，涵盖基础功…...

编程新知 2026/2/3 13:01:10

ubuntu22.04有线网络无法连接，图标也没了

今天突然无法有线网络无法连接任何设备，并且图标都没了错误案例往上一顿搜索，试了很多博客都不行，比如 Ubuntu22.04右上角网络图标消失最后解决的办法下载网卡驱动，重新安装操作步骤查看自己网卡的型号 lspci | gre…...

编程新知 2026/2/6 13:01:28

UE5 音效系统

一.音效管理音乐一般都是WAV,创建一个背景音乐类SoudClass,一个音效类SoundClass。所有的音乐都分为这两个类。再创建一个总音乐类，将上述两个作为它的子类。接着我们创建一个音乐混合类SoundMix，将上述三个类翻入其中，通过它管理每个音乐…...

编程新知 2026/1/29 17:39:27

标注工具核心架构分析——主窗口的图像显示

🏗️ 标注工具核心架构分析 📋 系统概述主要有两个核心类，采用经典的 Scene-View 架构模式： 🎯 核心类结构 1. AnnotationScene (QGraphicsScene子类) 主要负责标注场景的管理和交互 🔧 关键函数&…...

编程新知 2025/9/11 18:09:26

分布式光纤声振传感技术原理与瑞利散射机制解析

分布式光纤传感技术（Distributed Fiber Optic Sensing，简称DFOS）作为近年来迅速发展的新型感知手段，已广泛应用于边界安防、油气管道监测、结构健康诊断、地震探测等领域。其子类技术——分布式光纤声振传感（Distribut…...

编程新知 2026/2/6 13:13:32

相关文章：