当前位置：首页 > news >正文

（self-supervised learning）Event Camera Data Pre-training

news 2026/2/9 1:25:32

Publisher: ICCV 2023

MOTIVATION OF READING: 自监督学习、稀疏事件 = NILM

link: https://arxiv.org/pdf/2301.01928.pdf

Code: GitHub - Yan98/Event-Camera-Data-Pre-training

1. Overview

Contributions are summarized as follows:

1. A self-supervised framework for event camera data pre-training. The pre-trained model can be transferred to diverse downstream tasks;

2. A family of event data augmentations, generating meaningful event images;

3. A conditional masking strategy, sampling informative event patches for network training;

4. An embedding projection loss, using paired RGB embeddings to regularize event embeddings to avoid model collapse;

5. A probability distribution alignment loss for aligning embeddings from the paired event and RGB images.

6. We achieve state-of-the-art performance in standard event benchmark datasets.

2. Related work

The SSL frameworks can be generally divided into two categories: contrastive learning and masked modeling.

2.1 Contrastive learning

This approach generally assumes augmentation invariance of images. one notable drawback
of contrastive learning is suffering from model collapse and training instability.

2.2 Masked modeling

Reconstructing masked inputs from the (i. e., unmasked) visible ones is a popular selfsupervised
learning objective motivated by the idea of autoencoding. (Bert, GPT)

3. Methodology

For pre-training, our method takes event data E and its paired natural RGB image I as inputs, and outputs a pre-trained network fe.

Firstly, consecutively perform data augmentations, event image generation, and conditional masking to obtain two patch sets (xq, xk).

Secondly, fe extracts features from event patch set xq, and he_img and he_evt separately project features from fe to latent embeddings q_img and q_evt.

fm and hm_evt are the momentum of fe and he_evt, and are updated by the exponential moving average (EMA). (momentum的含义可以参考MOCO论文)

The momentum network takes patch set xk as input and generates an embedding k_evt.

At the same time, the natural RGB image I is embeded into y = f1(h1(I)).

Finally, we perform event discrimination, and event and natural RGB image discrimination to train our model. 这里不用INFONCE直接对q_evt和k_evt进行相似度计算是因为这么做会导致embedding collapse使得embedding过于相似。原因是事件图像是稀疏离散。因此使用RGB图像的映射。

L_evt is an event embedding projection loss aiming to pull together paired event embeddings qevt and kevt, for event discrimination.

L_RGB aims to pull together paired event and RGB embeddings q_evt and y, for event and natural RGB image discrimination.

L_k1 aims to drive fe learning discriminative event embeddings, towards well-structured embedding space of natural RGB images.

InfoNCE loss Contrastive learning aims to pull together embeddings q and k+, and pushes away embeddings q and {k−}.

Event embedding projection loss

ζ(v1, v2) is the projection function.

Event and RGB image discrimination

Considering the sparsity of the event image, a single event image is less informative than an RGB image, possessing difficulty for self-supervised event network training.

We pull together embeddings of paired event and RGB images, xq and I.

we first compute the pairwise embedding similarity and then fit an exponential kernel to the similarities to compute probability scores. The probability score of the (i, j)-th pair is given by,

Our probability distribution alignment loss is given by,

Total Loss

where λ1 is a hyper-parameter for balancing the losses.

4. Experiment

We evaluate our method on three downstream tasks: object recognition, optical flow estimation, and semantic segmentation.

（self-supervised learning）Event Camera Data Pre-training

1. Overview

2. Related work

2.1 Contrastive learning

2.2 Masked modeling

3. Methodology

4. Experiment

相关文章：

（self-supervised learning）Event Camera Data Pre-training

关于个人Git学习记录及相关

【eclipse】eclipse开发springboot项目使用入门

Android 13 默认关闭快速打开相机

pytest pytest-html优化样式

Visual Studio 配置DLL

C/C++转WebAssembly及微信小程序调用

【WPF.NET开发】弱事件模式

[Angular] 笔记 16：模板驱动表单 - 选择框与选项

Webpack基础使用

扭蛋机小程序搭建：打造互联网“流量池”

解决VNC连接Ubuntu服务器打开终端出现闪退情况

flutter是什么

GET和POST请求

基于电商场景的高并发RocketMQ实战-Broker写入读取流程性能优化总结、Broker基于Pull模式的主从复制原理

前端DApp开发利器，Ant Design Web3 正式发布 1.0

[RoarCTF 2019]Easy Java（java web）

Abaqus许可管理策略

对采集到的温湿度数据，使用python进行数据清洗，并使用预测模型进行预测未来一段时间的温湿度数据。

嵌入式SOC之通用图像处理之OSD文字信息叠加的相关实践记录

RestClient

RocketMQ延迟消息机制

《Playwright：微软的自动化测试工具详解》

视频字幕质量评估的大规模细粒度基准

【Zephyr 系列 10】实战项目：打造一个蓝牙传感器终端 + 网关系统（完整架构与全栈实现）

HTML前端开发：JavaScript 常用事件详解

MySQL用户和授权

CMake控制VS2022项目文件分组

LeetCode - 199. 二叉树的右视图

小木的算法日记-多叉树的递归/层序遍历