【计算机视觉】24-Object Detection
文章目录
- 24-Object Detection
- 1. Introduction
- 2. Methods
- 2.1 Sliding Window
- 2.2 R-CNN: Region-Based CNN
- 2.3 Fast R-CNN
- 2.4 Faster R-CNN: Learnable Region Proposals
- 2.5 Results of objects detection
- 3. Summary
- Reference
24-Object Detection
1. Introduction
-
Task Definition
Input: Single RGB Image
Output: A set of detected objects;
For each object predict:
-
Category label (from fixed, known set of categories)
-
Bounding box(four numbers: x, y, width, height)
-
-
Challenges
- Multiple outputs: Need to output variable numbers of objects per image
- Multiple types of output: Need to predict ”what” (category label) as well as “where” (bounding box)
- Large images: Classification works at 224x224; need higher resolution for detection, often ~800x600
-
Detecting a single object
With two branches, outputting label, and box
Problem: Images can have more than one object! And if we use multiple single object detection, it will decrease the efficiency.
2. Methods
2.1 Sliding Window
Apply a CNN to many different crops of the image, CNN classifies each crop as an object or background:
Problem: Need too many calculations
- Consider an image of size H*W and a box of size h*w
- Total possible boxes: ∑ h = 1 H ∑ w = 1 W ( W − w + 1 ) ( H − h + 1 ) = H ( H + 1 ) 2 W ( W + 1 ) 2 \sum_{h=1}^{H}\sum_{w=1}^{W}(W-w+1)(H-h+1)=\frac{H(H+1)}{2}\frac{W(W+1)}{2} ∑h=1H∑w=1W(W−w+1)(H−h+1)=2H(H+1)2W(W+1)
- 800 x 600 image has ~58M boxes! No way we can evaluate them all.
2.2 R-CNN: Region-Based CNN
-
Region Proposals(Selective Search)
Selective Search is a region proposal algorithm used in object detection. It is based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility.
Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher.

Selective Search algorithm takes these oversegments as initial input and performs the following steps
- Add all bounding boxes corresponding to segmented parts to the list of regional proposals
- Group adjacent segments based on similarity
- Go to step 1
At each iteration, larger segments are formed and added to the list of region proposals. Hence we create region proposals from smaller segments to larger segments in a bottom-up approach.
As for the calculation of similarity measures based on color, texture, size and shape compatibility, please refer to Selective Search for Object Detection (C++ / Python) | LearnOpenCV
-
Architecture of the network
On two thousand selected regions, we narrow them down to the size required for classification, and after passing through the convolutional network, we output the category along with the box offset
-
Steps
- Run region proposal method to compute ~2000 region proposals
- Resize each region to 224x224 and run independently through CNN to predict class scores and bbox transform
- Use scores to select a subset of region proposals to output (Many choices here: threshold on background, or per-category? Or take top K proposals per image?)
- Compare with ground-truth boxes
-
Details(Focus on step3 and 4)
-
Intersection over Union (IoU)
I o U = Area of Intersection Area of Union IoU=\frac{\color{yellow}{\text{Area of Intersection}}}{\color{purple}{\text{Area of Union}}} IoU=Area of UnionArea of Intersection

-
Non-Max Suppression (NMS)
-
Select next highest-scoring box
-
Eliminate lower-scoring boxes(Comparing the highest-scoring box to all the others ) with IoU > threshold (e.g. 0.7)
-
If any boxes remain, GOTO 1
Problem: NMS may eliminate ”good” boxes when objects are highly overlapping:
-
-

-
Mean Average Precision (mAP)

Use the gif to understand it(but I only have the final image):
For example, the mAP in COCO dataset is 0.4.
-
Problem: Very slow! Need to do ~2k forward passes for each image!
Solution: Run CNN before warping!
2.3 Fast R-CNN
-
Architecture:
-
Most of the computation happens in the backbone network; this saves work for overlapping region proposals
-
Per-Region network is relatively lightweight
-
-
The concrete architecture in Alexnet and Resnet:
-
Details:
How to crop features?
In this process, there are two errors:
如下图,假设输入图像经过一系列卷积层下采样32倍后输出的特征图大小为8x8,现有一 RoI 的左上角和右下角坐标(x, y 形式)分别为(0, 100) 和 (198, 224),映射至特征图上后坐标变为(0, 100 / 32)和(198 / 32,224 / 32),由于像素点是离散的,因此向下取整后最终坐标为(0, 3)和(6, 7),这里产生了第一次量化误差。
假设最终需要将 RoI 变为固定的2x2大小,那么将 RoI 平均划分为2x2个区域,每个区域长宽分别为 (6 - 0 + 1) / 2 和 (7 - 3 + 1) / 2 即 3.5 和 2.5,同样,由于像素点是离散的,因此有些区域的长取3,另一些取4,而有些区域的宽取2,另一些取3,这里产生了第二次量化误差。
-
RoI Align in Mask R-CNN

Notice: RoI Align needs to set a hyperparameter to represent the number of sampling points in each region, which is usually 4.
-
Speed

It has an enormous increase from R-CNN. But we can find that region proposals costs lots of time.
2.4 Faster R-CNN: Learnable Region Proposals
-
Architecture:
Insert Region Proposal Network (RPN) to predict proposals from feature

-
Details:

At each point, predict whether the corresponding anchor contains an object. And we use logistic regression to express the error. predict scores with conv layer
- Evaluation

-
Improvement
Faster R-CNN is a Two-stage object detector:

But we want to design the structure of end to end, eliminating the second stage. So we change the function of region proposal network to predict the class label.

2.5 Results of objects detection

- Two-stage method (Faster R-CNN) gets the best accuracy but are slower.
- Single-stage methods (SSD) are much faster but don’t perform as well
- Bigger backbones improve performance, but are slower
- Diminishing returns for slower methods

These results are a few years old …since then GPUs have gotten faster, and we’ve improved performance with many tricks:
- Train longer!
- Multiscale backbone: Feature
Pyramid Networks - Better backbone: ResNeXt
- Single-Stage methods have improved
- Very big models work better
- Test-time augmentation pushes
numbers up - Big ensembles, more data, etc
3. Summary

Reference
[1] RoI Pooling 系列方法介绍(文末附源码) - 知乎 (zhihu.com)
[2] Selective Search for Object Detection (C++ / Python) | LearnOpenCV
相关文章:
【计算机视觉】24-Object Detection
文章目录 24-Object Detection1. Introduction2. Methods2.1 Sliding Window2.2 R-CNN: Region-Based CNN2.3 Fast R-CNN2.4 Faster R-CNN: Learnable Region Proposals2.5 Results of objects detection 3. SummaryReference 24-Object Detection 1. Introduction Task Defin…...
【mac 解决eclipse意外退出】
打开eclipse时提示报错信息应用程序"Eclipse.app"无法打开(这里忘了截图就不上图了)。 点击 “好” 的按钮后会弹出发送报告的弹窗 终端输入:sudo codesign --force --deep --sign - /Applications/Eclipse.app/ 就可以解决了...
mysql innodb buffer pool缓冲池命中率和命中了哪些表?—— 筑梦之路
环境说明 mysql 5.7及以上 公式 # InnoDB缓冲区缓存的命中率计算公式100 * (1 - (innodb_buffer_pool_reads/innodb_buffer_pool_read_requests ))注意: 对于具有大型缓冲池的系统,既要关注该比率,也要关注OS页面读写速率的变化可以更好地跟踪差异。s…...
牛掰的dd命令,cpi0配合find备份(不会主动备份),od查看
dd if设备1或文件 of设备2或文件 blocknsize countn 还原就是把设备1,2调过来 这里想到dump的还原是命令restore,想起来就写一下,省的总忘记 可以针对整块磁盘进行复制,对于新创建的分区,也不用格式化,可以直接…...
pip list 和 conda list的区别
PS : 网上说conda activate了之后就可以随意pip了 可以conda和pip混用 但是安全起见还是尽量用pip 这样就算activate了,进入base虚拟环境了 conda与pip的区别 来源 Conda和pip通常被认为几乎完全相同。虽然这两个工具的某些功能重叠,但它们设计用于不…...
多目标应用:基于多目标灰狼优化算法MOGWO求解微电网多目标优化调度(MATLAB代码)
一、微网系统运行优化模型 微电网优化模型介绍: 微电网多目标优化调度模型简介_IT猿手的博客-CSDN博客 二、多目标灰狼优化算法MOGWO 多目标灰狼优化算法MOGWO简介: 三、多目标灰狼优化算法MOGWO求解微电网多目标优化调度 (1)…...
LangChain 2模块化prompt template并用streamlit生成网站 实现给动物取名字
上一节实现了 LangChain 实现给动物取名字, 实际上每次给不同的动物取名字,还得修改源代码,这周就用模块化template来实现。 1. 添加promptTemplate from langchain.llms import OpenAI # 导入Langchain库中的OpenAI模块 from langchain.p…...
linux nas
挂载到本地 mkdir -p /mnt/mountnasdir mount -t nfs 192.168.62:/cnas_id10086_vol10010_dev/ /mnt/mountnasdir...
控制您的音乐、视频等媒体内容
跨多个 Chrome 标签页播放音乐或声音 在计算机上打开 Chrome 。在标签页中播放音乐、视频或其他任何有声内容。您可以停留在该标签页上,也可以转到别处。要控制声音,请在右上角点击“媒体控件”图标 。您可暂停播放、转到下一首歌曲/下一个视频…...
xlua源码分析(三)C#访问lua的映射
xlua源码分析(三)C#访问lua的映射 上一节我们主要分析了lua call C#的无wrap实现。同时我们在第一节里提到过,C#使用LuaTable类持有lua层的table,以及使用Action委托持有lua层的function。而在xlua的官方文档中,推荐使…...
2023 极术通讯-汽车“新四化”路上,需要一片安全山海
导读:极术社区推出极术通讯,引入行业媒体和技术社区、咨询机构优质内容,定期分享产业技术趋势与市场应用热点。 芯方向 【Armv9】-动态TrustZone技术的介绍 动态 TrustZone 是提供多租户安全媒体 pipeline 的绝佳工具。完全不受操作系统、虚…...
Spring Boot接口设计规范
接口参数处理及统一结果响应 1、接口参数处理 1、普通参数接收 这种参数接收方式是比较常见的,由于是GET请求方式,所以在传参时直接在路径后拼接参数和参数值即可。 例如:localhost:8080/api/product/list?key1value1&key2value2 /…...
美创科技与南京大数据安全技术有限公司达成战略合作
近日,美创科技与南京大数据安全技术有限公司正式签署战略合作协议,优势力量共享、共拓共创共赢。 美创科技CEO柳遵梁、副总裁罗亮亮、副总裁王利强,南京大数据安全技术有限公司总经理潘杰、市场总监刘莉莎、销售总监王皓月、技术总监薛松等出…...
2.4路由日志管理
2.4路由/日志管理 一、静态路由和动态路由 路由器在转发数据时,需要现在路由表中查找相应的路由,有三种途径 (1)直连路由:路由器自动添加和自己直连的路由 (2)静态路由:管理员手动…...
归并排序详解:递归实现+非递归实现(图文详解+代码)
文章目录 归并排序1.递归实现2.非递归实现3.海量数据的排序问题 归并排序 时间复杂度:O ( N * logzN ) 每一层都是N,有log2N层空间复杂度:O(N),每个区间都会申请内存,最后申请的数组大小和array大小相同稳定…...
DataBinding原理
1、MainActivity首先使用DataBindingUtil.setContentView设置布局文件activity_main.xml。 2、随后,经过一系列函数调用,ActivityMainBindingImpl对象最终会实例化,并与activity_main.xml进行绑定。 3、实例化后的ActivityMainBindingImpl对象…...
docker更换国内源
docker更换国内源 1、编辑Docker配置文件 在终端中执行以下命令,编辑Docker配置文件: vi /etc/docker/daemon.json2、添加更新源 在打开的配置文件中,添加以下内容: {"registry-mirrors": ["https://hub-mirror…...
【咖啡品牌分析】Google Maps数据采集咖啡市场数据分析区域分析热度分布分析数据抓取瑞幸星巴克
引言 咖啡作为一种受欢迎的饮品,已经成为我们生活中不可或缺的一部分。随着国内外咖啡品牌的涌入,新加坡咖啡市场愈加多元化和竞争激烈。 本文对新加坡咖啡市场进行了全面的品牌门店数占比分析,聚焦于热门品牌的地理分布、投资价值等。通过…...
【Java】异常处理(一)
🌺个人主页:Dawn黎明开始 🎀系列专栏:Java ⭐每日一句:什么都不做,才会来不及 📢欢迎大家:关注🔍点赞👍评论📝收藏⭐️ 文章目录 📋前…...
【高级程序设计】Week2-4Week3-1 JavaScript
一、Javascript 1. What is JS 定义A scripting language used for client-side web development.作用 an implementation of the ECMAScript standard defines the syntax/characteristics of the language and a basic set of commonly used objects such as Number, Date …...
Veo 2K/4K生成失败率下降92%的核心设置(2024实测版Veo 2.3.1隐藏参数曝光)
更多请点击: https://codechina.net 第一章:Veo 2K/4K生成失败率下降92%的底层归因分析 Veo 视频生成模型在 2K/4K 高分辨率输出场景中,近期实测失败率由历史均值 18.7% 降至 1.5%,降幅达 92%。这一跃迁并非单一模块优化结果&…...
2026论文必藏降AIGC软件大曝光:一键压到安全线谁最稳
2026年的学术战场已经彻底变了天,论文不再是简单的知识输出,而是一场与AI检测系统的极限博弈。过去大家还在为查重率发愁,现在却集体陷入了更深层的焦虑——如何在不牺牲论文质量的前提下,把AIGC率压到最低?随着AI检测…...
(良心整理)亲测靠谱的AI论文平台,毕业生收藏备用
毕业季论文写起来是不是总感觉难上加难?选题纠结、资料找不全、写作卡壳、查重压力大、格式总是不对…… 这份亲测有效的AI论文工具合集,帮你一键解决写作难题,涵盖中英文写作、全流程辅助、专项功能,免费和高性价比的都有&#x…...
终极跨平台3D资产迁移革命:DazToBlender插件完整指南
终极跨平台3D资产迁移革命:DazToBlender插件完整指南 【免费下载链接】DazToBlender Daz to Blender Bridge 项目地址: https://gitcode.com/gh_mirrors/da/DazToBlender 你是否曾经在Daz Studio中精心创作了一个完美的3D角色,却因为无法在Blende…...
3分钟快速掌握Cursor试用重置工具:一键解除AI编程助手限制的完整指南
3分钟快速掌握Cursor试用重置工具:一键解除AI编程助手限制的完整指南 【免费下载链接】go-cursor-help 解决Cursor在免费订阅期间出现以下提示的问题: Your request has been blocked as our system has detected suspicious activity / Youve reached your trial r…...
AI教材编写必备:低查重AI工具,助力快速完成教材创作!
教材编写工具的选择与使用 在开始写教材之前,选择合适的工具几乎就像是一场“纠结大赛”!如果使用办公软件,功能通常很有限,框架和格式都需要手动去调整;而使用专业的编写工具,又往往因操作繁琐和学习曲线…...
低空飞行器降噪气动智能反向设计系统已融合人工智能AI软件平台
低空飞行器降噪气动智能反向设计大模型系统已融合人工智能AI软件平台一、系统概述本系统专为低空飞行器在城市低空飞行、近地通航及密集空域作业等场景量身打造。针对当前行业内气动噪声突出、降噪设计迭代缓慢、正向构型试错成本高昂、流噪耦合计算复杂以及合规降噪难度大等核…...
如何通过DeepEval解决LangChain应用的可观测性与评估难题
如何通过DeepEval解决LangChain应用的可观测性与评估难题 【免费下载链接】deepeval The LLM Evaluation Framework 项目地址: https://gitcode.com/GitHub_Trending/de/deepeval DeepEval作为专业的LLM评估框架,为LangChain开发者提供了从测试到生产监控的完…...
2026 Java面试真题库(基础+进阶+大厂场景题)
面试前期准备不充分其实就是对自己的不负责任,也是在浪费自己的时间,今天为大家整理了一份实战文档,让你系统性的弄懂架构师筑基内容:Linux 基础与进阶高性能 Netty 框架MySQL并发编程进阶JVM 性能调优Tomacat注意:以下…...
Copula导向的互相关随机场模拟及土坡可靠度分析【附仿真】
✨ 长期致力于土坡可靠度、信息扩散、Copula函数、互相关随机场、HMC-SS法研究工作,擅长数据搜集与处理、建模仿真、程序编写、仿真设计。 ✅ 专业定制毕设、代码 ✅ 如需沟通交流,点击《获取方式》 (1)二元信息扩散分布Copula模型…...
