[论文精读]Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
论文网址:[2304.08876] 用于定向微小目标检测的动态粗到细学习 (arxiv.org)
论文代码:https://github.com/ChaselTsui/mmrotate-dcfl
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
1. 省流版
1.1. 心得
1.2. 论文总结图
2. 论文逐段精读
2.1. Abstract
2.2. Introduction
2.4. Method
2.4.1. Dynamic Prior
2.4.2. Coarse Prior Matching
2.4.3. Finer Dynamic Posterior Matching
2.5. Experiments
2.5.1. Datasets
2.5.2. Implementation Details
2.5.3. Main Results
2.5.4. Ablation Study
2.6. Analysis
2.7. Conclusion
3. Reference List
1. 省流版
1.1. 心得
(1)为什么学脑科学的我要看这个啊?愿世界上没有黑工
(2)最开始写小标题的时候就发现了,分得好细啊,好感度++
(3)作为一个外行人,这文章感觉提出了好多东西
1.2. 论文总结图

2. 论文逐段精读
2.1. Abstract
①Extreme geometric shapes (tiny) and finite features (few pixels) of tiny rotating objects will cause serious mismatch (inaccurate positional prior?) and imbalance (inaccurate positive sample features?) issues
②They proposed dynamic prior and coarse-to-fine assigner, called DCFL
posterior adj.在后部的;在后面的 n.臀部;屁股
2.2. Introduction
①Oriented bounding box greatly eliminates redundant background area, especially in aerial images
②Comparison figure:

where M* denotes matching function;
green, blue and red boxes are true positive, false positive, and false negative predictions respectively,
the left figure set is static and the right is dynamic
③Figure of mismatch and imbalance issues:

each point in the left figure denotes a prior location(先验打那么多个点啊...而且为啥打得那么整齐,这是什么one-stage吗)
饼状图是说当每个框都是某个角度的时候吗?当每个框都不旋转的时候阳性样本平均数量是5.2?还是说饼状图的意思是自由旋转,某个特定角度的框的阳性样本是多少多少?这个饼状图并没有横向比较诶,只有这张图自己内部比较。
柱状图是锚框大小不同下平均阳性
④They introduce dynamic Prior Capturing Block (PCB) as their prior method. Based on this, they further utilize Cross-FPN-layer Coarse Positive Sample (CPS) to assign labels. After that, they reorder these candidates by prediction (posterior), and present gt by finer Dynamic Gaussian Mixture Model (DGMM)
eradicate vt.根除;消灭;杜绝 n.根除者;褪色灵
2.3. Related Work
2.3.1. Oriented Object Detection
(1)Prior for Oriented Objects
(2)Label Assignment
2.3.2. Tiny Object Detection
(1)Multi-scale Learning
(2)Label Assignment
(3)Context Information
(4)Feature Enhancement
2.4. Method
(1)Overview
①For a set of dense prior , where
denotes width,
denotes height and
denotes the number of shape information(什么东西啊,是那些点吗), mapping it to
by Deep Neural Network (DNN):
where represents the detection head(探测头...外行不太懂,感觉也就是一个函数嘛?);
one part in
denotes the classification scores, where
means the class number(更被认为是阳性的样本那层的
里的数据会更大吗);
one part in
denotes the classification scores, where
means the box parameter number(查宝说是w, h, x, y, a之类的是box parameter)
②In static methods, the pos labels assigned for is
③In dynamic methods, the pos labels set integrate posterior information:
④The loss function:
where and
represent the number of positive and negative samples,
is the neg labels set
⑤Modelling ,
and
:
2.4.1. Dynamic Prior
①Flexibility may alleviate mismatch problem
②Each prior represents a feature point
③The structure of Prior Capturing Block (PCB):

the surrounding information is considered by dilated convolution. Then caputure dynamic prior by Deformable Convolution Network (DCN). Moreover, using the offset learned from the regression branch to guide feature extraction in the classification branch and improve alignment between the two tasks.
④To achieve dynamic prior capturing, initializing each prior loaction by each feature point’s spatial location
. In each iteration, capture the offset set of each prior position
to update
:
where denotes the stride of feature map,
denotes the number of offsets;
2D Gaussian distribution is regarded as the prior distribution;
动态的作为高斯的平均向量
(啥玩意儿??);
⑤Presetting a square on each feature point
⑥The co-variance matrix:
dilate v.扩张;(使)膨胀;扩大 deformable adj.可变形的;应变的;易变形的
2.4.2. Coarse Prior Matching
①For prior, limiting to a single FPN may cause sub-optimal layer selection and releasing
to all layers may cause slow convergence
②Therefore, they propose Cross-FPN-layer Coarse Positive Sample (CPS) candidates, expanding candidate layers to 's nearby spatial location and adjacent FPN layers
③Generalized Jensen-Shannon Divergence (GJSD) constructs CPS between and
:
which yields a closed-form solution;
where ;
and due to the homogeneity of and
,
④Choosing top prior with highest GJSD for each
(选差异最大的那些)
2.4.3. Finer Dynamic Posterior Matching
①Two main steps are contained in this section, a posterior re-ranking strategy and a Dynamic Gaussian Mixture Model (DGMM) constraint
②The Possibility of becoming True predictions (PT) of the sample
is:
choosing top samples with the highest scores as Medium Positive Sample (MPS) candidates
③They apply DGMM, which contains geometry center and semantic center in one object, to filter far samples
④For specific instance , the mean vector
of the first Gaussian is the geometry center
, the deduced
in MPS denotes semantic center
⑤Parameterizing a instance:
where denotes weight of each Gaussian distribution and their summation is 1;
equals to
's
(什么啊这是,但是m可以等于1或者2诶,那你g的协方差不就又是语义中心又是几何中心了吗)
⑥For any , setting negative masks
2.5. Experiments
2.5.1. Datasets
①Datasets: DOTAv1.0 /v1.5/v2.0, DIOR-R, VisDrone, and MS COCO
②Ablation dataset: DOTA-v2.0 with the most numbet of tiny objects
③Comparing dataset: DOTA-v1.0, DOTAv1.5, DOTA-v2.0, VisDrone2019, MS COCO and DIOR-R
2.5.2. Implementation Details
①Batch size: 4
②Framework based: MMDetection and MMRotate
③Backbone: ImageNet pre-trained models
④Learning rate: 0.005 with SGD
⑤Momentum: 0.9
⑥Weight decay: 0.0001
⑦Default backbone: ResNet-50 with FPN
⑧Loss: Focal loss for classifying and IoU loss for regression
⑨Data augmentation: random flipping
⑩On DOTA-v1.0 and DOTA-v2.0, using official setting to crop images to 1024×1024. The overlap is 200 and epoch is 12
⑪On other datasets, setting the input size to 1024 × 1024 (overlap 200), 800 × 800, 1333 × 800, and 1333×800 for DOTA-v1.5, DIOR-R, VisDrone, and COCO respectively. Epoch is set as 40, 40, 12, and 12 on the DOTA-v1.5, DIOR-R, COCO, and VisDrone
2.5.3. Main Results
(1)Results on DOTA series
①Comparison table on DOTA-v2.0 OBB:

where the red ones are the best and the blue ones are the second best performance on each metric
②Comparison table on DOTA-v1.0 OBB:

③Comparison table on DOTA-v1.5 OBB:

(2)Results on DIOR-R
①Comparison table on DIOR-R:

②Results of typical tiny objects vehicle, bridge, and wind-mill:

(3)Results on HBB Datasets
①Comparison table on VisDrone, MS COCO abd DOTA-v2.0 HBB:

2.5.4. Ablation Study
(1)Effects of Individual Strategy
①Employ prior on each feature point
②Individual effectiveness:

(2)Comparisons of Different CPS
①Ablation:

(3)Fixed Prior and Dynamic Prior
①Ablation:

(4)Detailed Design in PCB
①Using the offset of the regression head to guide the offset classification head will align better than applying DCN to a single regression branch
(5)Effects of Parameters
①Parameter adjustment of and
:

②Parameter adjustment of

attenuate v. (使)减弱;(使)纤细,稀薄 adj. 减弱的;稀薄的;细小的
2.6. Analysis
(1)Reconciliation of imbalance problems
①The mean predicted IoU and the mean positive sample number of holding different angles and different scales (absolute size):

where the left column denotes the quality imbalance and the right column denotes the quantity imbalance. The dynamic learning from coarse to fine proposed in the paper solves the problem of sample mismatch, and more positive samples are compensated to the previous abnormal angles and scales, namely the rotated small-scale real boxes can be allocated to more positive samples than before
dissection n. 解剖,切开;解剖体;详细查究 delve vi./vt. 钻研;探究;挖;n. 穴;洞
(2)Visualization
①Visualization of elimilations of False Negative and False Positive predictions:

where the first row and the second row are the results of RetinaNet-OBB and DCFL respectively. Furthermore, TP, FN and FP are green, red and blue frames. It can be see that DCFL can effectively locate oriented small objects with extreme shapes
②Visualization of sampled dynamic priors:

(3)Speed
①Compared with R3Det, S 2A-Net and RetinaNet with 16.2, 18.9, 20.8, FPS of DCFL is 20.9, which means the high efficiency of DCFL
②Parameters and GLOPs of DCFL:

2.7. Conclusion
For solving the problems of mismatched feature prior and unbalanced positive samples, the authors proposed DCFL model with dynamic prior and coarse-to-fine assigner. Ultimately,, it achieves a remarkable performance
3. Reference List
Xu, C. et al. (2023) 'Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection', CVPR. doi: https://doi.org/10.48550/arXiv.2304.08876
相关文章:
[论文精读]Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
论文网址:[2304.08876] 用于定向微小目标检测的动态粗到细学习 (arxiv.org) 论文代码:https://github.com/ChaselTsui/mmrotate-dcfl 英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误&…...
Selenium WebDriver 中用于查找网页元素的两个方法
这里提供了 Selenium WebDriver 中用于查找元素的两个方法:find_element() 和 find_elements()。 find_element(byid, value: Optional[str] None) → selenium.webdriver.remote.webelement.WebElement 这个方法用于查找满足指定定位策略(By strategy&…...
python 常用装饰器
文章目录 property的介绍与使用作用使用场景装饰方法防止属性被修改 实现setter和getter的行为 staticmethod 与 classmethod作用代码示例 两者区别使用区别代码演示 abstractmethod参考资料 property的介绍与使用 python的property是python的一种装饰器,是用来修饰…...
深入解析MySQL日志系统:Binlog、Undo Log和Redo Log
在数据库系统中,日志文件扮演着至关重要的角色,它们不仅保证了数据的完整性和一致性,还支持了数据的恢复、复制和审计等功能。MySQL数据库中最核心的日志系统包括二进制日志(Binlog)、回滚日志(Undo Log&am…...
强森算法求两点最短路径的基本流程及代码实现
对于强森算法,给定的一个图中,算法首先会构造一个新的节点s,然后从新构造的这个节点引出多条边分别连通图中的每一个节点,这些边的长度一开始是被设置为0的,然后使用贝尔曼-福德算法进行计算,算出从s到图中每一个节点的最短路径。 而在运行贝尔曼-福德算法的过程中如果发…...
数据结构入门篇 之 【双链表】的实现讲解(附完整实现代码及顺序表与线性表的优缺点对比)
一日读书一日功,一日不读十日空 书中自有颜如玉,书中自有黄金屋 一、双链表 1、双链表的结构 2、双链表的实现 1)、双向链表中节点的结构定义 2)、初始化函数 LTInit 3)、尾插函数 LTPushBack 4)、头…...
什么是零日攻击?
一、零日攻击的概念 零日攻击是指利用零日漏洞对系统或软件应用发动的网络攻击。 零日漏洞也称零时差漏洞,通常是指还没有补丁的安全漏洞。由于零日漏洞的严重级别通常较高,所以零日攻击往往也具有很大的破坏性。 目前,任何安全产品或解决方案…...
阿里云2025届春招实习生招聘
投递时间:2024年2月1日-2026年3月1日 岗位职责 负责大型客户“上云”,"用云"技术平台开发。 开发云迁移运维技术工具,帮助阿里云服务团队&&企业客户和服务商自主、高效的完成云迁移。 开发云运维技术工具,帮助…...
简单了解多线程
并发和并行 并发: 在同一时刻,多个指令在单一CPU上交替指向 并行:在同一时刻,多个指令在多个CPU上同时执行 2核4线程,4核8线程,8核16线程,16核32线程 基础实现线程的方式 Thread :继承类 &…...
GEE对上传并读取CSV文件
首先在Assets中上传csv csv格式如下所示: 上传好了之后,来看看这个表能否显示 var table ee.FeatureCollection("projects/a-flyllf0313/assets/dachang_2022"); var sortedTable table.sort(id); // 替换 propertyName 为你想要排序的属性…...
vulnhub-----SickOS靶机
文章目录 1.信息收集2.curl命令反弹shell提权利用POC 1.信息收集 ┌──(root㉿kali)-[~/kali/vulnhub/sockos] └─# arp-scan -l Interface: eth0, type: EN10MB, MAC: 00:0c:29:10:3c:9b, IPv4: 10.10.10.10 Starting arp-scan 1.9.8 with 256…...
slab分配器
什么是slab分配器? 用户态程序可以使用malloc及其在C标准库中的相关函数申请内存;内核也需要经常分配内存,但无法使用标准库函数;linux内核中,伙伴分配器是一种页分配器,是以页为单位的,但这个…...
MySQL面试题之基础夯实
一、mysql当中的基本数据类型有哪些 MySQL中的基本数据类型包括但不限于以下几大类: 数值类型: 整数类型:TINYINT、SMALLINT、MEDIUMINT、INT(INTEGER)、BIGINT浮点数类型:FLOAT、DOUBLE、DECIMAL…...
feign请求添加拦截器
FeignClient 的 configuration 属性: Feign 注解 FeignClient 的 configuration 属性,可以对 feign 的请求进行配置。 包括配置Feign的Encoder、Decoder、 Interceptor 等。 feign 请求添加拦截器,也可以通过这个 configuration 属性 来指…...
蓝桥杯之简单数论冲刺
文章目录 取模快速幂 取模 这道题目有两个注意点: 1.当你的取模之后刚好等于0的话,后面就不用进行后面的计算 2.if sum detail[i] > q: 这个语句的等号也很重要 import os import sys# 请在此输入您的代码a,b,n map(int,input().split())week a*5 …...
Http的缓存有哪些
HTTP 缓存可以通过多种 HTTP 头部字段来控制,主要包括以下几种: 1.Expires:这个字段定义了响应的过期时间。如果当前时间小于 Expires 的时间,那么就可以直接使用缓存。 2.Cache-Control:这个字段是一个指令ÿ…...
Linux 网络虚拟化 Macvlan(基于物理网络接口虚拟网络接口) 认知
写在前面 博文内容涉及 Macvlan 的简单认知,以及一个Demo博文内容根据《 Kubernetes 网络权威指南:基础、原理与实践》 整理理解不足小伙伴帮忙指正 不必太纠结于当下,也不必太忧虑未来,当你经历过一些事情的时候,眼前…...
Spark-Scala语言实战(1)
在之前的文章中,我们学习了如何在Linux安装Spark以及Scala,想了解的朋友可以查看这篇文章。同时,希望我的文章能帮助到你,如果觉得我的文章写的不错,请留下你宝贵的点赞,谢谢。 Spark及Scala的安装https:/…...
NBlog Java定时任务-备份MySQL数据
NBlog部署维护流程记录(持续更新):https://blog.csdn.net/qq_43349112/article/details/136129806 为了避免服务器被攻击,给博客添加了一个MySQL数据备份功能。 此功能是配合博客写的,有些方法直接用的已有的…...
微信小程序项目实战遇到的问题
我们以学生成绩平台来作为例子。这是我们想得到的效果。 以下是完整代码: index.js // index.js Page({//页面的初始数据data: {hello: 欢迎进入微信小程序的编程世界,score: 80,userArray: [{name: 张三,score: [66, 77, 86, 70, 90]},{name: 李四,score: [88, 7…...
在Mac M1(ARM)上部署CentOS 8:VMware Fusion实战与网络配置详解
1. 环境准备与软件下载 在Mac M1上部署CentOS 8虚拟机,首先需要确认你的硬件和软件环境是否满足要求。M1芯片采用ARM架构,这与传统x86架构有很大不同,因此需要特别注意软件版本兼容性。我实际测试发现,如果选错版本会导致安装失败…...
Barrier终极指南:一套键鼠控制Windows、macOS、Linux三系统,免费开源KVM软件让你效率翻倍![特殊字符]
Barrier终极指南:一套键鼠控制Windows、macOS、Linux三系统,免费开源KVM软件让你效率翻倍!🚀 【免费下载链接】barrier Open-source KVM software 项目地址: https://gitcode.com/gh_mirrors/ba/barrier 你是否曾在多台电脑…...
Word文档分节与页码进阶:从封面、目录到正文的格式定制指南
1. 为什么需要分节设置页码? 第一次写毕业论文时,我也被页码设置折磨得够呛。封面莫名其妙出现了页码"1",目录页的罗马数字死活显示不出来,正文页码竟然从"3"开始计数。后来才发现,Word的页码逻辑…...
别再踩坑了!Android 10+ 保存图片到相册的完整流程与权限处理(附完整代码)
Android 10 图片保存实战:避开Scoped Storage的12个深坑 每次看到同事在Android 10设备上调试图片保存功能时抓狂的样子,我都会想起自己曾经踩过的那些坑。从MediaStore的诡异行为到权限申请的玄学问题,这个看似简单的功能背后藏着太多"…...
从Java转行大模型应用,基于unsloth的量化演示的实战案例内存、推理速度、资源 、性能对比
本文提供可直接复现的 Unsloth 4/8-bit 量化实战案例,覆盖:内存占用优化(显存 / 内存对比)推理速度加速(tokens/s 对比)计算资源消耗降低(GPU 利用率 / 功耗)模型性能无损验证&#…...
ABAP ALV删除行后数据又‘复活’?一个方法搞定check_changed_data
ABAP ALV删除行数据同步异常排查指南:从Del键失效到check_changed_data的深度解析 在SAP系统开发中,可编辑ALV报表的数据同步问题堪称"经典陷阱"。许多开发者都遇到过这样的场景:用户信心满满地按下Del键删除行项目,点击…...
基于RISC-V指令集的五级流水线CPU设计与验证:包括详细说明、代码注释及Vivado平台验证
基于riscv指令集的五级流水线CPU设计及其验证 可以上板,且有详细说明和代码注释 基于vivado平台进行验证 包括verilog源代码、汇编验证代码、详细的说明文档(47页)以及PPT Modelsim quartus vivado都跑过,确认代码没有问题 已RISC…...
MQTTX+Qt联合调试指南:手把手搭建物联网通信测试环境
MQTTXQt联合调试指南:手把手搭建物联网通信测试环境 在物联网开发中,MQTT协议因其轻量级和高效性成为设备通信的首选方案。而Qt框架的跨平台特性与MQTTX工具的直观可视化界面,为开发者提供了从原型验证到产品落地的完整工具链。本文将带您从零…...
保姆级教程:在RK3588上部署PaddleOCR,从ONNX转换到NPU推理全流程(附避坑指南)
RK3588实战:PaddleOCR模型从训练到NPU推理的全链路避坑指南 当OCR技术遇上边缘计算设备,RK3588凭借其6TOPS算力的NPU成为绝佳载体。但将PaddleOCR这样的复杂模型部署到嵌入式平台,就像在迷宫中寻找出口——每个转角都可能遇到版本兼容性、工具…...
如何彻底解决RimWorld卡顿:Performance Fish性能优化完整指南
如何彻底解决RimWorld卡顿:Performance Fish性能优化完整指南 【免费下载链接】Performance-Fish Performance Mod for RimWorld 项目地址: https://gitcode.com/gh_mirrors/pe/Performance-Fish 如果您正在RimWorld中管理大型殖民地时遭遇令人沮丧的游戏卡顿…...
