114 research outputs found
A 3D Omnidirectional Sensor For Mobile Robot Applications
International audienc
RGB-Event Fusion for Moving Object Detection in Autonomous Driving
Moving Object Detection (MOD) is a critical vision task for successfully
achieving safe autonomous driving. Despite plausible results of deep learning
methods, most existing approaches are only frame-based and may fail to reach
reasonable performance when dealing with dynamic traffic participants. Recent
advances in sensor technologies, especially the Event camera, can naturally
complement the conventional camera approach to better model moving objects.
However, event-based works often adopt a pre-defined time window for event
representation, and simply integrate it to estimate image intensities from
events, neglecting much of the rich temporal information from the available
asynchronous events. Therefore, from a new perspective, we propose RENet, a
novel RGB-Event fusion Network, that jointly exploits the two complementary
modalities to achieve more robust MOD under challenging scenarios for
autonomous driving. Specifically, we first design a temporal multi-scale
aggregation module to fully leverage event frames from both the RGB exposure
time and larger intervals. Then we introduce a bi-directional fusion module to
attentively calibrate and fuse multi-modal features. To evaluate the
performance of our network, we carefully select and annotate a sub-MOD dataset
from the commonly used DSEC dataset. Extensive experiments demonstrate that our
proposed method performs significantly better than the state-of-the-art
RGB-Event fusion alternatives
PHROG: A Multimodal Feature for Place Recognition
International audienceLong-term place recognition in outdoor environments remains a challenge due to high appearance changes in the environment. The problem becomes even more difficult when the matching between two scenes has to be made with information coming from different visual sources, particularly with different spectral ranges. For instance, an infrared camera is helpful for night vision in combination with a visible camera. In this paper, we emphasize our work on testing usual feature point extractors under both constraints: repeatability across spectral ranges and long-term appearance. We develop a new feature extraction method dedicated to improve the repeatability across spectral ranges. We conduct an evaluation of feature robustness on long-term datasets coming from different imaging sources (optics, sensors size and spectral ranges) with a Bag-of-Words approach. The tests we perform demonstrate that our method brings a significant improvement on the image retrieval issue in a visual place recognition context, particularly when there is a need to associate images from various spectral ranges such as infrared and visible: we have evaluated our approach using visible, Near InfraRed (NIR), Short Wavelength InfraRed (SWIR) and Long Wavelength InfraRed (LWIR)
Vers une reconnaissance en ligne d'actions à partir de caméras RGB-D
International audienc
Event-Free Moving Object Segmentation from Moving Ego Vehicle
Moving object segmentation (MOS) in dynamic scenes is challenging for
autonomous driving, especially for sequences obtained from moving ego vehicles.
Most state-of-the-art methods leverage motion cues obtained from optical flow
maps. However, since these methods are often based on optical flows that are
pre-computed from successive RGB frames, this neglects the temporal
consideration of events occurring within inter-frame and limits the
practicality of these methods in real-life situations. To address these
limitations, we propose to exploit event cameras for better video
understanding, which provide rich motion cues without relying on optical flow.
To foster research in this area, we first introduce a novel large-scale dataset
called DSEC-MOS for moving object segmentation from moving ego vehicles.
Subsequently, we devise EmoFormer, a novel network able to exploit the event
data. For this purpose, we fuse the event prior with spatial semantic maps to
distinguish moving objects from the static background, adding another level of
dense supervision around our object of interest - moving ones. Our proposed
network relies only on event data for training but does not require event input
during inference, making it directly comparable to frame-only methods in terms
of efficiency and more widely usable in many application cases. An exhaustive
comparison with 8 state-of-the-art video object segmentation methods highlights
a significant performance improvement of our method over all other methods.
Project Page: https://github.com/ZZY-Zhou/DSEC-MOS
Kinematic Spline Curves: A temporal invariant descriptor for fast action recognition
International audienc
3D real-time human action recognition using a spline interpolation approach
International audienc
An extension of kernel learning methods using a modified Log-Euclidean distance for fast and accurate skeleton-based Human Action Recognition
International audienc
A fast and accurate motion descriptor for human action recognition applications
International audienc
- …