7 research outputs found
Contrastive Initial State Buffer for Reinforcement Learning
In Reinforcement Learning, the trade-off between exploration and exploitation
poses a complex challenge for achieving efficient learning from limited
samples. While recent works have been effective in leveraging past experiences
for policy updates, they often overlook the potential of reusing past
experiences for data collection. Independent of the underlying RL algorithm, we
introduce the concept of a Contrastive Initial State Buffer, which
strategically selects states from past experiences and uses them to initialize
the agent in the environment in order to guide it toward more informative
states. We validate our approach on two complex robotic tasks without relying
on any prior information about the environment: (i) locomotion of a quadruped
robot traversing challenging terrains and (ii) a quadcopter drone racing
through a track. The experimental results show that our initial state buffer
achieves higher task performance than the nominal baseline while also speeding
up training convergence
Bridging the Gap Between Events and Frames Through Unsupervised Domain Adaptation
Reliable perception during fast motion maneuvers or in high dynamic range environments is crucial for robotic systems. Since event cameras are robust to these challenging conditions, they have great potential to increase the reliability of robot vision. However, event-based vision has been held back by the shortage of labeled datasets due to the novelty of event cameras. To overcome this drawback, we propose a task transfer method to train models directly with labeled images and unlabeled event data. Compared to previous approaches, (i) our method transfers from single images to events instead of high frame rate videos, and (ii) does not rely on paired sensor data. To achieve this, we leverage the generative event model to split event features into content and motion features. This split enables efficient matching between latent spaces for events and images, which is crucial for successful task transfer. Thus, our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks. Our task transfer method consistently outperforms methods targeting Unsupervised Domain Adaptation for object detection by 0.26 mAP (increase by 93%) and classification by 2.7% accuracy
ESS: Learning Event-Based Semantic Segmentation from Still Images
Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the lack of high-quality, labeled datasets. In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras
Data-driven Feature Tracking for Event Cameras
Because of their high temporal resolution, increased resilience to motion
blur, and very sparse output, event cameras have been shown to be ideal for
low-latency and low-bandwidth feature tracking, even in challenging scenarios.
Existing feature tracking methods for event cameras are either handcrafted or
derived from first principles but require extensive parameter tuning, are
sensitive to noise, and do not generalize to different scenarios due to
unmodeled effects. To tackle these deficiencies, we introduce the first
data-driven feature tracker for event cameras, which leverages low-latency
events to track features detected in a grayscale frame. We achieve robust
performance via a novel frame attention module, which shares information across
feature tracks. By directly transferring zero-shot from synthetic to real data,
our data-driven tracker outperforms existing approaches in relative feature age
by up to 120% while also achieving the lowest latency. This performance gap is
further increased to 130% by adapting our tracker to real data with a novel
self-supervision strategy
ESS: Learning Event-based Semantic Segmentation from Still Images
Retrieving accurate semantic information in challenging high dynamic range
(HDR) and high-speed conditions remains an open challenge for image-based
algorithms due to severe image degradations. Event cameras promise to address
these challenges since they feature a much higher dynamic range and are
resilient to motion blur. Nonetheless, semantic segmentation with event cameras
is still in its infancy which is chiefly due to the novelty of the sensor, and
the lack of high-quality, labeled datasets. In this work, we introduce ESS,
which tackles this problem by directly transferring the semantic segmentation
task from existing labeled image datasets to unlabeled events via unsupervised
domain adaptation (UDA). Compared to existing UDA methods, our approach aligns
recurrent, motion-invariant event embeddings with image embeddings. For this
reason, our method neither requires video data nor per-pixel alignment between
images and events and, crucially, does not need to hallucinate motion from
still images. Additionally, to spur further research in event-based semantic
segmentation, we introduce DSEC-Semantic, the first large-scale event-based
dataset with fine-grained labels. We show that using image labels alone, ESS
outperforms existing UDA approaches, and when combined with event labels, it
even outperforms state-of-the-art supervised approaches on both DDD17 and
DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount
of existing labeled image datasets and paves the way for new and exciting
research directions in new fields previously inaccessible for event cameras