当前位置：首页 > article >正文

YOLOE实战：用文本提示快速识别图片中的任意物体

article 2026/3/16 8:42:26

YOLOE实战用文本提示快速识别图片中的任意物体你有没有遇到过这样的情况看到一张照片想知道里面有什么东西但传统的物体识别工具只能识别它预设好的那几十种、几百种物体。如果照片里有个你没见过的物品或者你想找一些非常具体的对象这些工具就无能为力了。这就是传统目标检测模型的局限性——它们被“封闭词汇表”束缚住了手脚。但今天我要介绍的YOLOE彻底改变了这个局面。它就像给你的电脑装上了一双“能看懂一切的智能眼睛”你只需要用简单的文字描述它就能在图片里找到对应的物体无论这个物体是不是它之前见过的。更棒的是现在有了YOLOE官版镜像你不需要折腾复杂的环境配置不需要手动下载各种依赖包只需要几分钟时间就能在自己的电脑上体验这种“开放词汇表检测”的强大能力。接下来我就带你一步步实现这个功能。1. 为什么你需要YOLOE的文本提示功能在深入技术细节之前我们先来看看YOLOE的文本提示功能到底能帮你解决哪些实际问题。1.1 传统检测模型的痛点想象一下你是一个电商平台的运营人员每天要处理成千上万的商品图片。传统的物体检测模型可能能识别“鞋子”、“衣服”、“包包”但如果用户上传了一张“带有金属扣的复古牛皮马丁靴”模型可能就识别不出来了因为它没学过“金属扣”、“复古”、“马丁靴”这些具体的描述。或者你是一个内容审核员需要从海量图片中找出违规内容。传统的模型只能检测“人”、“车”、“建筑”这些大类但如果你需要找的是“穿着特定服装的人”、“特定品牌的车”传统模型就无能为力了。1.2 YOLOE文本提示的优势YOLOE的文本提示功能完美解决了这些问题灵活性极高你不需要重新训练模型只需要在推理时告诉它你想找什么。今天找“红色的消防车”明天找“戴帽子的人”后天找“破损的轮胎”同一个模型都能搞定。零样本迁移即使模型在训练时从未见过“无人机”、“智能手表”这些物体只要你用文字描述出来它也能尝试识别。这种能力在快速变化的应用场景中特别有价值。语义理解YOLOE不是简单的关键词匹配它能理解语义。你输入“交通工具”它会把汽车、自行车、摩托车都找出来你输入“水果”苹果、香蕉、橙子都逃不过它的“眼睛”。实时高效虽然功能强大但YOLOE保持了YOLO系列的传统优势——速度快。在普通显卡上也能达到实时检测的效果完全可以在生产环境中使用。2. 5分钟快速部署YOLOE官版镜像好了理论说再多不如亲手试一试。下面我就带你用最简单的方式在5分钟内把YOLOE运行起来。2.1 准备工作首先你需要确保你的电脑满足以下条件安装了Docker如果没有去Docker官网下载安装过程很简单有一块NVIDIA显卡如果没有也可以用CPU运行只是速度会慢一些至少有8GB的可用磁盘空间如果你的电脑有NVIDIA显卡还需要安装NVIDIA Container Toolkit这样Docker才能使用GPU加速。安装命令如下# 添加NVIDIA容器工具包仓库 distribution$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list # 安装工具包 sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker2.2 启动YOLOE容器准备工作完成后只需要一行命令就能启动YOLOE环境docker run -it --gpus all \ --name yoloe-demo \ -v $(pwd)/my_data:/root/data \ -p 7860:7860 \ registry.cn-hangzhou.aliyuncs.com/csdn_mirrors/yoloe:latest让我解释一下这个命令的各个部分--gpus all让容器可以使用所有GPU如果你用CPU去掉这个参数--name yoloe-demo给容器起个名字方便管理-v $(pwd)/my_data:/root/data把本地的my_data文件夹映射到容器里的/root/data这样你可以在本地准备图片在容器里直接使用-p 7860:7860把容器的7860端口映射到本地这是Gradio可视化界面的端口命令执行后你会进入容器的命令行界面。这时候先激活YOLOE的运行环境# 激活conda环境 conda activate yoloe # 进入项目目录 cd /root/yoloe现在YOLOE的环境就准备好了整个过程真的只需要几分钟。3. 文本提示实战三种使用方式YOLOE支持多种文本提示的使用方式从最简单的命令行到灵活的Python API再到可视化的Web界面总有一种适合你。3.1 方式一命令行快速测试最简单如果你只是想快速测试一下效果命令行是最直接的方式。YOLOE镜像已经内置了一些示例图片我们可以直接用它们来测试。先看看项目里有哪些示例图片ls ultralytics/assets/你会看到一些常见的测试图片比如bus.jpg公交车、zidane.jpg足球运动员等。我们就用bus.jpg来做个测试python predict_text_prompt.py \ --source ultralytics/assets/bus.jpg \ --checkpoint pretrain/yoloe-v8l-seg.pt \ --names person bus wheel \ --device cuda:0这个命令的意思是--source指定要检测的图片路径--checkpoint指定使用的模型文件这里用了最大的v8l模型--names告诉模型我们要找什么物体人、公交车、车轮--device cuda:0使用第一个GPU进行计算运行后你会在终端看到检测结果同时会在runs/detect目录下生成标注好的图片。打开看看你会发现模型准确地找到了图片中的人、公交车和车轮。3.2 方式二Python API灵活调用最常用对于开发者来说通过Python API调用更加灵活。你可以把YOLOE集成到自己的应用中实现更复杂的功能。创建一个新的Python文件比如demo.pyfrom ultralytics import YOLOE import cv2 import matplotlib.pyplot as plt # 方法1自动从HuggingFace下载模型需要网络 print(正在加载模型...) model YOLOE.from_pretrained(jameslahm/yoloe-v8s-seg) # 方法2使用本地模型文件如果没有网络 # model YOLOE(pretrain/yoloe-v8s-seg.pt) # 准备测试图片 image_path ultralytics/assets/bus.jpg # 定义我们要找的物体 # 可以是简单的单词 simple_names [person, bus, wheel] # 也可以是具体的描述 detailed_names [穿红色衣服的人, 双层公交车, 汽车轮胎] print(开始检测...) # 使用简单名称检测 results model.predict(image_path, namessimple_names) # 显示结果 for result in results: # 绘制检测框 annotated_image result.plot() # 显示图片 plt.figure(figsize(10, 6)) plt.imshow(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB)) plt.axis(off) plt.title(YOLOE文本提示检测结果) plt.show() # 打印详细信息 print(检测到的物体) for box in result.boxes: class_id int(box.cls[0]) confidence float(box.conf[0]) print(f - {simple_names[class_id]}: 置信度 {confidence:.2%})这段代码展示了YOLOE Python API的基本用法。你可以看到模型加载非常简单一行代码就能从HuggingFace自动下载模型或者从本地加载。文本提示很灵活你可以用简单的单词也可以用具体的描述。结果处理很方便检测结果包含了边界框、置信度、类别等所有信息。3.3 方式三Gradio可视化界面最直观如果你不熟悉命令行或者想给非技术人员演示Gradio可视化界面是最佳选择。YOLOE镜像已经内置了Web界面只需要运行python app.py然后在浏览器中打开http://localhost:7860你会看到一个简洁的Web界面界面主要分为三个区域左侧上传图片区域中间文本输入区域输入你要找的物体用逗号分隔右侧结果显示区域操作步骤点击Upload按钮上传一张图片在文本框中输入要检测的物体比如person, dog, car, tree点击Submit按钮等待几秒钟右侧就会显示标注好的图片这个界面特别适合快速演示给客户或领导展示YOLOE的能力批量测试可以连续上传多张图片进行测试参数调整实时调整置信度阈值等参数观察效果变化4. 实战案例用文本提示解决实际问题了解了基本用法后我们来看几个实际的应用场景看看YOLOE的文本提示功能如何解决真实问题。4.1 案例一电商商品审核假设你负责一个二手交易平台的图片审核需要确保商品图片符合规范。传统方法需要训练多个分类器而用YOLOE只需要一个模型。from ultralytics import YOLOE import os class EcommerceImageChecker: def __init__(self): # 加载模型 self.model YOLOE.from_pretrained(jameslahm/yoloe-v8m-seg) # 定义审核规则 self.rules { 违禁品: [刀具, 枪支, 毒品, 烟花爆竹], 敏感内容: [裸露人体, 血腥暴力, 政治敏感], 信息违规: [电话号码, 二维码, 网址链接] } def check_image(self, image_path): 检查单张图片 violations [] # 对每类违规内容进行检查 for category, items in self.rules.items(): # 使用文本提示检测特定物体 results self.model.predict( image_path, namesitems, conf0.3 # 置信度阈值 ) # 如果有检测结果说明可能违规 for result in results: if len(result.boxes) 0: violations.append({ category: category, items: items, count: len(result.boxes) }) break # 只要检测到一类违规就停止 return violations def batch_check(self, image_dir): 批量检查图片 all_results {} for filename in os.listdir(image_dir): if filename.lower().endswith((.png, .jpg, .jpeg)): image_path os.path.join(image_dir, filename) violations self.check_image(image_path) if violations: all_results[filename] violations return all_results # 使用示例 checker EcommerceImageChecker() # 检查单张图片 result checker.check_image(product_image.jpg) print(f审核结果: {result}) # 批量检查 batch_result checker.batch_check(uploaded_images/) print(f批量审核发现 {len(batch_result)} 张问题图片)这个方案的优势灵活性强随时添加新的审核规则只需要修改文本提示维护简单不需要为每个新规则重新训练模型准确率高利用YOLOE的开放词汇能力能识别各种描述4.2 案例二智能相册管理很多人手机里有成千上万张照片找起来很麻烦。用YOLOE可以轻松实现智能相册管理。import os from PIL import Image from ultralytics import YOLOE import json from datetime import datetime class SmartPhotoAlbum: def __init__(self, photo_dir): self.photo_dir photo_dir self.model YOLOE.from_pretrained(jameslahm/yoloe-v8s-seg) self.index_file photo_index.json # 常见的人物、场景、物体标签 self.common_tags { 人物: [人, 人脸, 人群, 儿童, 老人, 男人, 女人], 动物: [狗, 猫, 鸟, 鱼, 宠物, 野生动物], 场景: [海滩, 山脉, 城市, 森林, 室内, 夜景], 活动: [运动, 饮食, 旅行, 聚会, 工作, 学习], 物品: [汽车, 建筑, 食物, 电子产品, 家具, 服装] } def index_photos(self): 为所有照片建立索引 photo_index {} for filename in os.listdir(self.photo_dir): if filename.lower().endswith((.png, .jpg, .jpeg)): print(f正在处理: {filename}) image_path os.path.join(self.photo_dir, filename) tags self.analyze_image(image_path) # 获取图片信息 img Image.open(image_path) create_time datetime.fromtimestamp(os.path.getctime(image_path)) # 保存到索引 photo_index[filename] { path: image_path, tags: tags, size: img.size, create_time: create_time.strftime(%Y-%m-%d %H:%M:%S) } # 保存索引文件 with open(self.index_file, w, encodingutf-8) as f: json.dump(photo_index, f, ensure_asciiFalse, indent2) return photo_index def analyze_image(self, image_path): 分析单张图片内容 all_tags [] # 对每个类别进行检测 for category, items in self.common_tags.items(): try: results self.model.predict( image_path, namesitems, conf0.25, # 降低阈值以检测更多内容 max_det10 # 最多检测10个物体 ) for result in results: if len(result.boxes) 0: all_tags.append(category) break # 只要检测到该类别的物体就添加标签 except Exception as e: print(f分析{category}时出错: {e}) return list(set(all_tags)) # 去重 def search_photos(self, query): 根据文本搜索照片 if not os.path.exists(self.index_file): print(请先建立索引) return [] # 加载索引 with open(self.index_file, r, encodingutf-8) as f: photo_index json.load(f) # 直接使用查询文本进行检测 matching_photos [] for filename, info in photo_index.items(): # 使用查询文本检测图片 results self.model.predict( info[path], names[query], conf0.3 ) # 如果检测到相关物体 for result in results: if len(result.boxes) 0: matching_photos.append({ filename: filename, path: info[path], confidence: float(result.boxes[0].conf[0]) }) break # 按置信度排序 matching_photos.sort(keylambda x: x[confidence], reverseTrue) return matching_photos # 使用示例 album SmartPhotoAlbum(my_photos/) # 建立照片索引第一次运行需要时间 print(正在建立照片索引...) index album.index_photos() print(f已索引 {len(index)} 张照片) # 搜索照片 print(\n搜索包含海滩的照片...) beach_photos album.search_photos(海滩) for photo in beach_photos[:5]: # 显示前5个结果 print(f - {photo[filename]} (置信度: {photo[confidence]:.1%})) print(\n搜索包含生日蛋糕的照片...) cake_photos album.search_photos(生日蛋糕) for photo in cake_photos: print(f - {photo[filename]} (置信度: {photo[confidence]:.1%}))这个智能相册的特点自然语言搜索直接用文字描述你想找的照片内容自动打标签自动为照片添加分类标签零样本识别即使模型没学过生日蛋糕也能尝试识别完全离线所有处理都在本地完成保护隐私4.3 案例三工业质检自动化在工业生产中经常需要检测产品缺陷。传统方法需要为每种缺陷训练专门的模型而YOLOE可以用一个模型检测多种缺陷。import cv2 import numpy as np from ultralytics import YOLOE import time class IndustrialInspector: def __init__(self): # 使用中等大小的模型平衡速度和精度 self.model YOLOE.from_pretrained(jameslahm/yoloe-v8m-seg) # 定义常见的缺陷类型 self.defect_types { 表面缺陷: [划痕, 凹痕, 裂纹, 锈迹, 污渍], 装配问题: [缺失零件, 错位, 松动, 歪斜], 外观问题: [颜色不均, 气泡, 杂质, 毛边], 尺寸问题: [过大, 过小, 变形, 不对称] } def inspect_product(self, image_path, product_type): 检测单个产品 inspection_report { product_id: os.path.basename(image_path).split(.)[0], product_type: product_type, timestamp: time.strftime(%Y-%m-%d %H:%M:%S), defects: [], overall_status: PASS } # 读取并预处理图像 image cv2.imread(image_path) if image is None: return {error: 无法读取图像} # 根据产品类型选择检测项 if product_type 金属零件: defect_list self.defect_types[表面缺陷] self.defect_types[尺寸问题] elif product_type 塑料制品: defect_list self.defect_types[外观问题] [变形, 缩水] elif product_type 电子元件: defect_list [焊点不良, 引脚弯曲, 标记错误, 污染] else: defect_list sum(self.defect_types.values(), []) # 所有缺陷类型 # 使用文本提示检测缺陷 print(f正在检测 {product_type} 的缺陷...) results self.model.predict( image_path, namesdefect_list, conf0.4, # 较高的置信度阈值减少误报 imgsz640 # 固定输入尺寸 ) # 分析检测结果 defect_count 0 for result in results: for i, box in enumerate(result.boxes): defect_type defect_list[int(box.cls[0])] confidence float(box.conf[0]) # 获取边界框坐标 x1, y1, x2, y2 box.xyxy[0].tolist() # 获取分割掩码如果有 mask None if result.masks is not None and i len(result.masks.data): mask result.masks.data[i].cpu().numpy() # 添加到报告 inspection_report[defects].append({ type: defect_type, confidence: confidence, location: { x1: x1, y1: y1, x2: x2, y2: y2 }, area: (x2 - x1) * (y2 - y1) # 缺陷区域面积 }) defect_count 1 # 判断产品是否合格 if defect_count 0: inspection_report[overall_status] FAIL inspection_report[defect_count] defect_count # 计算严重程度 total_area image.shape[0] * image.shape[1] defect_area sum(d[area] for d in inspection_report[defects]) severity defect_area / total_area if severity 0.1: # 缺陷面积超过10% inspection_report[severity] CRITICAL elif severity 0.05: # 缺陷面积5%-10% inspection_report[severity] MAJOR else: # 缺陷面积小于5% inspection_report[severity] MINOR return inspection_report def visualize_results(self, image_path, report, output_path): 可视化检测结果 image cv2.imread(image_path) for defect in report[defects]: # 绘制边界框 x1, y1, x2, y2 map(int, [ defect[location][x1], defect[location][y1], defect[location][x2], defect[location][y2] ]) # 根据严重程度选择颜色 if report.get(severity) CRITICAL: color (0, 0, 255) # 红色 elif report.get(severity) MAJOR: color (0, 165, 255) # 橙色 else: color (0, 255, 255) # 黄色 # 绘制矩形 cv2.rectangle(image, (x1, y1), (x2, y2), color, 2) # 添加标签 label f{defect[type]}: {defect[confidence]:.1%} cv2.putText(image, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) # 添加总体状态 status_text f状态: {report[overall_status]} if report[overall_status] FAIL: status_text f | 缺陷数: {report[defect_count]} | 严重程度: {report.get(severity, N/A)} cv2.putText(image, status_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0) if report[overall_status] PASS else (0, 0, 255), 2) # 保存结果 cv2.imwrite(output_path, image) return output_path # 使用示例 inspector IndustrialInspector() # 检测一个产品 report inspector.inspect_product(product_001.jpg, 金属零件) print(f检测报告: {report}) # 可视化结果 if report[overall_status] FAIL: output_image inspector.visualize_results( product_001.jpg, report, inspection_result.jpg ) print(f结果已保存到: {output_image})这个工业质检系统的优势多缺陷统一检测一个模型检测多种缺陷减少模型维护成本灵活适应通过修改文本提示轻松适应新产品、新缺陷详细报告不仅检测缺陷还评估严重程度可视化输出生成带标注的图片方便人工复核5. 性能优化与实用技巧虽然YOLOE已经很快了但在实际应用中我们还可以通过一些技巧进一步提升性能和效果。5.1 选择合适的模型尺寸YOLOE提供了多种尺寸的模型你需要根据实际需求选择# 不同尺寸模型的比较 models { yoloe-v8s-seg: { description: 小模型速度最快, 适用场景: 实时检测、移动端部署, 参数量: 约9M, 速度: 142 FPS (RTX 3090) }, yoloe-v8m-seg: { description: 中模型平衡速度与精度, 适用场景: 大多数应用场景, 参数量: 约25M, 速度: 98 FPS (RTX 3090) }, yoloe-v8l-seg: { description: 大模型精度最高, 适用场景: 对精度要求高的场景, 参数量: 约44M, 速度: 58 FPS (RTX 3090) } } # 根据需求选择模型 def select_model(requirements): if requirements.get(real_time, False): return yoloe-v8s-seg elif requirements.get(high_accuracy, False): return yoloe-v8l-seg else: return yoloe-v8m-seg # 示例选择实时检测的模型 requirements {real_time: True, high_accuracy: False} selected_model select_model(requirements) print(f推荐模型: {selected_model}) print(f模型信息: {models[selected_model]})5.2 优化文本提示的技巧文本提示的质量直接影响检测效果以下是一些实用技巧class TextPromptOptimizer: staticmethod def get_better_prompts(target_object): 为特定物体生成更好的文本提示 prompt_strategies { # 策略1使用同义词 synonyms: lambda obj: [ obj, f一个{obj}, f这种{obj}, f{obj}物体 ], # 策略2添加属性描述 with_attributes: lambda obj: [ f红色的{obj}, f大的{obj}, f小的{obj}, f圆形{obj}, f方形{obj} ], # 策略3使用场景上下文 with_context: lambda obj: [ f桌子上的{obj}, f{obj}在手里, f地上的{obj}, f墙上的{obj} ], # 策略4专业术语 technical_terms: { 车: [汽车, 轿车, 车辆, 小汽车, 机动车], 人: [人物, 人体, 人员, 行人, 人类], 狗: [犬只, 狗狗, 宠物狗, 小狗, 犬类] } } # 生成多种提示 all_prompts [] # 基本提示 all_prompts.append(target_object) # 同义词策略 all_prompts.extend(prompt_strategies[synonyms](target_object)) # 属性描述策略 all_prompts.extend(prompt_strategies[with_attributes](target_object)) # 场景上下文策略 all_prompts.extend(prompt_strategies[with_context](target_object)) # 专业术语策略 if target_object in prompt_strategies[technical_terms]: all_prompts.extend(prompt_strategies[technical_terms][target_object]) # 去重 return list(set(all_prompts)) staticmethod def test_prompts(model, image_path, target_object): 测试不同提示的效果 prompts TextPromptOptimizer.get_better_prompts(target_object) results [] print(f测试 {target_object} 的不同提示词:) for prompt in prompts[:5]: # 测试前5个 try: detection_results model.predict( image_path, names[prompt], conf0.3 ) confidence 0 if detection_results[0].boxes is not None: if len(detection_results[0].boxes) 0: confidence float(detection_results[0].boxes[0].conf[0]) results.append({ prompt: prompt, confidence: confidence, detected: confidence 0.3 }) status ✓ if confidence 0.3 else ✗ print(f {status} {prompt}: {confidence:.1%}) except Exception as e: print(f 错误: {prompt} - {e}) # 找出最佳提示 valid_results [r for r in results if r[detected]] if valid_results: best max(valid_results, keylambda x: x[confidence]) print(f\n最佳提示词: {best[prompt]} (置信度: {best[confidence]:.1%})) return best[prompt] else: print(f\n警告: 未找到有效的提示词) return target_object # 使用示例 optimizer TextPromptOptimizer() best_prompt optimizer.test_prompts(model, test_image.jpg, 杯子) print(f最终使用的提示词: {best_prompt})5.3 批量处理与性能监控在实际应用中我们经常需要处理大量图片。以下是一个批量处理的示例import concurrent.futures import time from tqdm import tqdm class BatchProcessor: def __init__(self, model_nameyoloe-v8m-seg, max_workers4): self.model YOLOE.from_pretrained(fjameslahm/{model_name}) self.max_workers max_workers def process_single_image(self, args): 处理单张图片 image_path, prompts, output_dir args try: # 执行检测 results self.model.predict( image_path, namesprompts, conf0.3, saveFalse, # 不自动保存我们自己处理 verboseFalse # 不输出详细信息 ) # 处理结果 detections [] for result in results: if result.boxes is not None: for box in result.boxes: detections.append({ image: os.path.basename(image_path), class: prompts[int(box.cls[0])], confidence: float(box.conf[0]), bbox: box.xyxy[0].tolist() }) # 保存带标注的图片 if detections: annotated results[0].plot() output_path os.path.join(output_dir, fannotated_{os.path.basename(image_path)}) cv2.imwrite(output_path, annotated) return { image: image_path, detections: detections, success: True } except Exception as e: return { image: image_path, error: str(e), success: False } def process_batch(self, image_paths, prompts, output_diroutput): 批量处理图片 os.makedirs(output_dir, exist_okTrue) # 准备参数 tasks [(path, prompts, output_dir) for path in image_paths] # 使用线程池并行处理 all_results [] start_time time.time() with concurrent.futures.ThreadPoolExecutor(max_workersself.max_workers) as executor: # 提交任务 future_to_image {executor.submit(self.process_single_image, task): task[0] for task in tasks} # 处理结果 with tqdm(totallen(tasks), desc处理进度) as pbar: for future in concurrent.futures.as_completed(future_to_image): result future.result() all_results.append(result) pbar.update(1) # 统计信息 end_time time.time() total_time end_time - start_time successful sum(1 for r in all_results if r[success]) total_detections sum(len(r.get(detections, [])) for r in all_results if r[success]) print(f\n批量处理完成!) print(f处理图片数: {len(image_paths)}) print(f成功数: {successful}) print(f失败数: {len(image_paths) - successful}) print(f总检测数: {total_detections}) print(f总耗时: {total_time:.2f}秒) print(f平均每张: {total_time/len(image_paths):.2f}秒) print(f输出目录: {output_dir}) return all_results # 使用示例 processor BatchProcessor(max_workers2) # 使用2个线程 # 获取所有图片 image_dir batch_images/ image_paths [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.lower().endswith((.png, .jpg, .jpeg))] # 定义要检测的物体 prompts [人, 车, 树, 建筑, 动物] # 批量处理 results processor.process_batch(image_paths[:10], prompts) # 先处理10张测试 # 保存结果到JSON import json with open(batch_results.json, w, encodingutf-8) as f: json.dump(results, f, ensure_asciiFalse, indent2)6. 总结通过本文的实战演示你应该已经掌握了YOLOE文本提示功能的核心用法。让我们回顾一下重点6.1 核心优势回顾YOLOE的文本提示功能之所以强大主要因为真正的开放词汇不再受限于预定义的类别你可以用任何文字描述想要检测的物体零样本迁移即使模型没见过的物体只要描述准确也能尝试识别实时高效在保持高精度的同时速度足够实时应用使用简单几行代码就能实现复杂的功能6.2 应用场景总结基于文本提示的开放词汇检测几乎可以应用于所有需要物体识别的场景内容审核用自然语言描述违规内容自动检测智能搜索用文字搜索图片中的特定物体工业质检描述各种缺陷类型统一检测零售分析统计货架商品、分析顾客行为安防监控寻找特定特征的人或物医疗影像辅助医生识别病灶6.3 下一步学习建议如果你已经掌握了基本用法可以继续深入尝试视觉提示除了文本提示YOLOE还支持用图片作为提示实现以图搜物探索无提示模式让模型自动发现图片中的所有物体进行模型微调针对特定领域的数据进行微调获得更好的效果部署到生产环境学习如何将YOLOE部署到服务器或边缘设备最重要的是现在就开始动手实践。YOLOE官版镜像已经为你准备好了一切不需要复杂的配置不需要漫长的训练只需要你的创意和想要解决的问题。记住最好的学习方式就是使用。找一个你感兴趣的应用场景用YOLOE尝试解决一个实际问题。在这个过程中你会遇到问题会找到解决方案会积累经验——这才是技术成长的正确路径。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。

YOLOE实战：用文本提示快速识别图片中的任意物体

相关文章：

YOLOE实战：用文本提示快速识别图片中的任意物体

Fish Speech 1.5语音克隆实战：5分钟部署，用30秒音频克隆你的专属音色

PLC-Recorder实战：从零配置西门子1200PLC数据采集

大功率USB集线器硬件设计：PD供电与协议隔离方案

ClearerVoice-Studio语音分离案例：播客节目主持人与嘉宾语音独立导出

基于GD32VW553的SG90舵机PWM驱动与角度控制实战

基于GLM-OCR的AI编程助手构想：自动识别代码截图并转换为可执行代码

Swin2SR部署实操：Docker镜像拉取→端口映射→Web界面访问，完整步骤详解

Phi-3-Mini-128K本地知识库问答效果展示：快速检索技术文档

Ostrakon-VL-8B镜像免配置：集成NVIDIA Container Toolkit，一键GPU调用

Nunchaku-flux-1-dev与STM32嵌入式开发：工业检测图像生成方案

RexUniNLU多领域泛化能力展示：同一模型在电商搜索与医疗问答中表现对比

超迷你透明LCD时钟日历游戏机设计

零代码搭建文档分析系统：OpenDataLab MinerU完整使用教程

PyTorch 2.5镜像实测：开箱即用的深度学习开发环境

从零到一：SuperPoint特征检测算法实战训练与评估全解析

ADS-阻抗匹配轨迹可视化实战指南

RexUniNLU实战：手把手教你用Python爬虫数据做智能情感与实体分析

2026年,我找到了以下8款支持视频变声的配音软件

【语义分割实战】从零到一：基于MMSegmentation的遥感影像道路提取全流程解析

从建模到优化：类人机器人舞台动作规划与能耗管理的数学实践

从last_hidden_state到pooler_output：BERT模型输出的完整处理流程（避坑指南）

Cosmos-Reason1-7B应用场景：建筑工地安全合规性视觉审计落地实践

NotaGen新手入门：零代码生成巴赫风格管弦乐乐谱

USB 2.0四口拓展坞硬件设计详解（基于SL2.1A）

Phi-3 Forest Lab保姆级教程：本地运行森林晨曦实验室全环境配置详解

同轴电缆长度与终端负载一键检测系统设计

Qwen3-TTS快速入门：10种语言语音合成，5分钟完成第一个作品

VScode与Vivado编码格式冲突：彻底解决中文注释乱码问题

从约束到平滑：三次多项式轨迹生成的数学推导与工程实践