当前位置：首页 > news >正文

OpenAI——CLIPs（代码使用示例）

news 2026/2/10 6:12:09

OpenAI——CLIPs(打通NLP与CV)

Open AI在2021年1月份发布Contrastive Language-Image Pre-training(CLIP),基于对比文本-图像对对比学习的多模态模型，通过图像和它对应的文本描述对比学习，模型能够学习到文本-图像对的匹配关系。它开源、多模态、zero-shot、few-shot、监督训练均可。
原文原理图：
在这里插入图片描述
原文算法思想伪代码：

OpenAI CLIP 原项目：

https://github.com/openai/CLIP

使用

（一）原版
安装：

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git

当然没有GPU和cuda，直接CPU也可以
源码：

import torch
import clip
from PIL import Imagedevice = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)image = preprocess(Image.open("cat.png")).unsqueeze(0).to(device)  # CLIP.png为本文中图一，即CLIP的流程图
text = clip.tokenize( ["cat in basket", "python", "a cute cat","pytorch","code of CLIP","code of pytorch ","code"]).to(device)  # 将这三句话向量化with torch.no_grad():image_features = model.encode_image(image) # 将图片进行编码text_features = model.encode_text(text)    # 将文本进行编码# print("image_features shape:",image_features.shape,image_features.size(),image_features.ndim)# print("text_features shape:", text_features.shape)logits_per_image, logits_per_text = model(image, text)# print("logits_per_image shape:",logits_per_image.shape)# print("logits_per_text shape:", logits_per_text.shape)probs = logits_per_image.softmax(dim=-1).cpu().numpy()print("Label probs:", probs)  # prints: [[0.9927937  0.00421068 0.00299572]] # 图片"CLIP.png",text["a diagram", "a dog", "a cat"] 对应"a diagram"的概率为0.9927937####(2)接前:矩阵相乘分类
import pandas as pd
with torch.no_grad():score = []image_features = model.encode_image(image) # 将图片进行编码image_features /= image_features.norm(dim=-1, keepdim=True)text_features = model.encode_text(text)    # 将文本进行编码text_features /= text_features.norm(dim=-1, keepdim=True)# texts = ["cat in basket", "python", "a cute cat","pytorch","code of CLIP","code of pytorch ","code"]texts = ["cat in basket", "python", "a cat","pytorch","code","pytorch code"]for text in texts:textp = clip.tokenize(text)# 问题文本编码textp_embeddings = model.encode_text(textp)textp_embeddings /= textp_embeddings.norm(dim=-1, keepdim=True)# 计算图片和问题之间的匹配分数(矩阵相乘)sc = float((image_features  @ textp_embeddings.T).cpu().numpy())score.append(sc)print(pd.DataFrame({'texts': texts, 'score': score}).sort_values('score', ascending=False))print('')print('-------------------------')print('')

（二）transformer库版本
Transformers 库的基本使用：
https://blog.csdn.net/benzhujie1245com/article/details/125279229
安装：

pip install transformers

CLIP源码：

####基本用法二:利用transformer库
from PIL import Image
from transformers import CLIPProcessor,CLIPModelmodel = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
#这里加入自己图片的地址就行
image = Image.open('cat.png')
#这里加入类别的标签类别
text = ["cat in basket", "python", "a cute cat","pytorch","code of CLIP","code of pytorch ","code"]
inputs = processor(text=text,images = image,return_tensors="pt",padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = logits_per_image.softmax(dim=1)for i in range(len(text)):print(text[i],":",probs[0][i])

输入图片：
在这里插入图片描述
结果：

但是CLIP对于有些比较抽象的图片或任务效果并不一定好，例如：
图片code.png：
在这里插入图片描述

PLUS：

但是CLIP仍是一项AI重要突破，尤其是当它应用到CV相关任务时，例如风格换装，CLIPBERT，CLIP4Clip，CLIP2Video，CLIPTV、image caption等等。

OpenAI——CLIPs（代码使用示例）

OpenAI——CLIPs(打通NLP与CV)

OpenAI CLIP 原项目：

使用

PLUS：

相关文章：

OpenAI——CLIPs（代码使用示例）

什么样的人更适合创业？那类人创业更容易成功？

JavaApi操作ElasticSearch（强烈推荐）

NFT的前景，元宇宙的发展

C#基础教程20 预处理器指令

【FPGA】Verilog：时序电路设计 | 二进制计数器 | 计数器 | 分频器 | 时序约束

国外SEO策略指南：确保你的网站排名第一！

Tik Tok新手秘籍，做好五点可轻松起号

【Linux】网络入门

回溯法——力扣题型全解【更新中】

【华为机试真题详解 Python实现】分奖金【2023 Q1 | 100分】

netlink进行网卡重命名

2023年春【数据分析与挖掘】文献精读（一）-1：针对COVID-19，使用聚类方法有效提取生物特性关联进而识别预防COVID-19的药物

【Go自学第三节】Go的范围（Range）用法

【备战面试】每日10道面试题打卡-Day6

Stable Diffusion 个人推荐的各种模型及设置参数、扩展应用等合集（不断更新中）

Salesforce 2023财年逆风增长，现金流达历史最高！

2023年3月全国数据治理工程师认证DAMA-CDGA/CDGP考试怎么通过？

【安卓软件】KMPlayer-一款完美的媒体播放器可以播放所有格式的字幕和视频

ClickHouse--分布式查询多副本的路由规则

XCTF-web-easyupload

Linux 文件类型，目录与路径，文件与目录管理

QMC5883L的驱动

Nuxt.js 中的路由配置详解

ETLCloud可能遇到的问题有哪些？常见坑位解析

CRMEB 框架中 PHP 上传扩展开发：涵盖本地上传及阿里云 OSS、腾讯云 COS、七牛云

深度学习习题2

重启Eureka集群中的节点，对已经注册的服务有什么影响

Netty从入门到进阶（二）

AI+无人机如何守护濒危物种？YOLOv8实现95%精准识别