46 research outputs found
Focus Is All You Need: Loss Functions For Event-based Vision
Event cameras are novel vision sensors that output pixel-level brightness
changes ("events") instead of traditional video frames. These asynchronous
sensors offer several advantages over traditional cameras, such as, high
temporal resolution, very high dynamic range, and no motion blur. To unlock the
potential of such sensors, motion compensation methods have been recently
proposed. We present a collection and taxonomy of twenty two objective
functions to analyze event alignment in motion compensation approaches (Fig.
1). We call them Focus Loss Functions since they have strong connections with
functions used in traditional shape-from-focus applications. The proposed loss
functions allow bringing mature computer vision tools to the realm of event
cameras. We compare the accuracy and runtime performance of all loss functions
on a publicly available dataset, and conclude that the variance, the gradient
and the Laplacian magnitudes are among the best loss functions. The
applicability of the loss functions is shown on multiple tasks: rotational
motion, depth and optical flow estimation. The proposed focus loss functions
allow to unlock the outstanding properties of event cameras.Comment: 29 pages, 19 figures, 4 table
MicroPoem: experimental investigation of birch pollen emissions
Diseases due to aeroallergens constantly increased over the last decades and affect more and more people. Adequate protective and pre-emptive measures require both reliable assessment of production and release of various pollen species, and the forecasting of their atmospheric dispersion. Pollen forecast models, which may be either based on statistical knowledge or full physical transport and dispersion modeling, can provide pollen forecasts with full spatial coverage. Such models are currently being developed in many countries. The most important shortcoming in these pollen transport systems is the description of emissions, namely the dependence of the emission rate on physical processes such as turbulent exchange or mean transport and biological processes such as ripening (temperature) and preparedness for release. Thus the quantification of pollen emissions and determination of the governing mesoscale and micrometeorological factors are subject of the present project MicroPoem, which includes experimental field work as well as numerical modeling. The overall goal of the project is to derive an emission parameterization based on meteorological parameters, eventually leading to enhanced pollen forecasts. In order to have a well-defined source location, an isolated birch pollen stand was chosen for the set-up of a ânatural tracer experiment', which was conducted during the birch pollen season in spring 2009. The site was located in a broad valley, where a mountain-plains wind system usually became effective during clear weather periods. This condition allowed to presume a rather persistent wind direction and considerable velocity during day- and nighttime. Several micrometeorological towers were operated up- and downwind of this reference source and an array of 26 pollen traps was laid out to observe the spatio-temporal variability of pollen concentrations. Additionally, the lower boundary layer was probed by means of a sodar and a tethered balloon system (also yielding a pollen concentration profile). In the present contribution a project overview is given and first results are presented. An emphasis is put on the relative performance of different sample technologies and the corresponding relative calibration in the lab and the field. The concentration distribution downwind of the birch stand exhibits a significant spatial (and temporal) variability. Small-scale numerical dispersion modeling will be used to infer the emission characteristics that optimally explain the observed concentration patterns
E-RAFT: Dense Optical Flow from Event Cameras
We propose to incorporate feature correlation and sequential processing into dense optical flow estimation from event cameras. Modern frame-based optical flow methods heavily rely on matching costs computed from feature correlation. In contrast, there exists no optical flow method for event cameras that explicitly computes matching costs. Instead, learning-based approaches using events usually resort to the U-Net architecture to estimate optical flow sparsely. Our key finding is that the introduction of correlation features significantly improves results compared to previous methods that solely rely on convolution layers. Compared to the state-of-the-art, our proposed approach computes dense optical flow and reduces the end-point error by 23% on MVSEC. Furthermore, we show that all existing optical flow methods developed so far for event cameras have been evaluated on datasets with very small displacement fields with maximum flow magnitude of 10 pixels. Based on this observation, we introduce a new real-world dataset that exhibits displacement fields with magnitudes up to 210 pixels and 3 times higher camera resolution. Our proposed approach reduces the end-point error on this dataset by 66%
DSEC: A Stereo Event Camera Dataset for Driving Scenarios
Once an academic venture, autonomous driving has received unparalleled
corporate funding in the last decade. Still, the operating conditions of
current autonomous cars are mostly restricted to ideal scenarios. This means
that driving in challenging illumination conditions such as night, sunrise, and
sunset remains an open problem. In these cases, standard cameras are being
pushed to their limits in terms of low light and high dynamic range
performance. To address these challenges, we propose, DSEC, a new dataset that
contains such demanding illumination conditions and provides a rich set of
sensory data. DSEC offers data from a wide-baseline stereo setup of two color
frame cameras and two high-resolution monochrome event cameras. In addition, we
collect lidar data and RTK GPS measurements, both hardware synchronized with
all camera data. One of the distinctive features of this dataset is the
inclusion of high-resolution event cameras. Event cameras have received
increasing attention for their high temporal resolution and high dynamic range
performance. However, due to their novelty, event camera datasets in driving
scenarios are rare. This work presents the first high-resolution, large-scale
stereo dataset with event cameras. The dataset contains 53 sequences collected
by driving in a variety of illumination conditions and provides ground truth
disparity for the development and evaluation of event-based stereo algorithms.Comment: IEEE Robotics and Automation Letter
Bridging the Gap Between Events and Frames Through Unsupervised Domain Adaptation
Reliable perception during fast motion maneuvers or in high dynamic range environments is crucial for robotic systems. Since event cameras are robust to these challenging conditions, they have great potential to increase the reliability of robot vision. However, event-based vision has been held back by the shortage of labeled datasets due to the novelty of event cameras. To overcome this drawback, we propose a task transfer method to train models directly with labeled images and unlabeled event data. Compared to previous approaches, (i) our method transfers from single images to events instead of high frame rate videos, and (ii) does not rely on paired sensor data. To achieve this, we leverage the generative event model to split event features into content and motion features. This split enables efficient matching between latent spaces for events and images, which is crucial for successful task transfer. Thus, our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks. Our task transfer method consistently outperforms methods targeting Unsupervised Domain Adaptation for object detection by 0.26 mAP (increase by 93%) and classification by 2.7% accuracy
From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection
Today, state-of-the-art deep neural networks that process events first
convert them into dense, grid-like input representations before using an
off-the-shelf network. However, selecting the appropriate representation for
the task traditionally requires training a neural network for each
representation and selecting the best one based on the validation score, which
is very time-consuming. This work eliminates this bottleneck by selecting
representations based on the Gromov-Wasserstein Discrepancy (GWD) between raw
events and their representation. It is about 200 times faster to compute than
training a neural network and preserves the task performance ranking of event
representations across multiple representations, network backbones, datasets,
and tasks. Thus finding representations with high task scores is equivalent to
finding representations with a low GWD. We use this insight to, for the first
time, perform a hyperparameter search on a large family of event
representations, revealing new and powerful representations that exceed the
state-of-the-art. Our optimized representations outperform existing
representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1
dataset, two established object detection benchmarks, and reach a 3.8% higher
classification score on the mini N-ImageNet benchmark. Moreover, we outperform
state-of-the-art by 2.1 mAP on Gen1 and state-of-the-art feed-forward methods
by 6.0 mAP on the 1 Mpx datasets. This work opens a new unexplored field of
explicit representation optimization for event-based learning.Comment: 15 pages, 11 figures, 2 tables, ICCV 2023 Camera Ready pape
Recurrent Vision Transformers for Object Detection with Event Cameras
We present Recurrent Vision Transformers (RVTs), a novel backbone for object
detection with event cameras. Event cameras provide visual information with
sub-millisecond latency at a high-dynamic range and with strong robustness
against motion blur. These unique properties offer great potential for
low-latency object detection and tracking in time-critical scenarios. Prior
work in event-based vision has achieved outstanding detection performance but
at the cost of substantial inference time, typically beyond 40 milliseconds. By
revisiting the high-level design of recurrent vision backbones, we reduce
inference time by a factor of 5 while retaining similar performance. To achieve
this, we explore a multi-stage design that utilizes three key concepts in each
stage: First, a convolutional prior that can be regarded as a conditional
positional embedding. Second, local- and dilated global self-attention for
spatial feature interaction. Third, recurrent temporal feature aggregation to
minimize latency while retaining temporal information. RVTs can be trained from
scratch to reach state-of-the-art performance on event-based object detection -
achieving an mAP of 47.5% on the Gen1 automotive dataset. At the same time,
RVTs offer fast inference (13 ms on a T4 GPU) and favorable parameter
efficiency (5 times fewer than prior art). Our study brings new insights into
effective design choices that could be fruitful for research beyond event-based
vision
E-RAFT: Dense Optical Flow from Event Cameras
We propose to incorporate feature correlation and sequential processing into dense optical flow estimation from event cameras. Modern frame-based optical flow methods heavily rely on matching costs computed from feature correlation. In contrast, there exists no optical flow method for event cameras that explicitly computes matching costs. Instead, learning-based approaches using events usually resort to the U-Net architecture to estimate optical flow sparsely. Our key finding is that the introduction of correlation features significantly improves results compared to previous methods that solely rely on convolution layers. Compared to the state-of-the-art, our proposed approach computes dense optical flow and reduces the end-point error by 23% on MVSEC. Furthermore, we show that all existing optical flow methods developed so far for event cameras have been evaluated on datasets with very small displacement fields with maximum flow magnitude of 10 pixels. Based on this observation, we introduce a new real-world dataset that exhibits displacement fields with magnitudes up to 210 pixels and 3 times higher camera resolution. Our proposed approach reduces the end-point error on this dataset by 66%
Video to Events: Recycling Video Datasets for Event Cameras
Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high dynamic range (HDR), high temporal resolution, and no motion blur. Recently, novel learning approaches operating on event data have achieved impressive results. Yet, these methods require a large amount of event data for training, which is hardly available due the novelty of event sensors in computer vision research. In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data. This unlocks the use of a virtually unlimited number of existing video datasets for training networks designed for real event data. We evaluate our method on two relevant vision tasks, i.e., object recognition and semantic segmentation, and show that models trained on synthetic events have several benefits: (i) they generalize well to real event data, even in scenarios where standard-camera images are blurry or overexposed, by inheriting the outstanding properties of event cameras; (ii) they can be used for fine-tuning on real data to improve over state-of-the-art for both classification and semantic segmentation
AlphaPilot: Autonomous Drone Racing
This paper presents a novel system for autonomous, vision-based drone racing
combining learned data abstraction, nonlinear filtering, and time-optimal
trajectory planning. The system has successfully been deployed at the first
autonomous drone racing world championship: the 2019 AlphaPilot Challenge.
Contrary to traditional drone racing systems, which only detect the next gate,
our approach makes use of any visible gate and takes advantage of multiple,
simultaneous gate detections to compensate for drift in the state estimate and
build a global map of the gates. The global map and drift-compensated state
estimate allow the drone to navigate through the race course even when the
gates are not immediately visible and further enable to plan a near
time-optimal path through the race course in real time based on approximate
drone dynamics. The proposed system has been demonstrated to successfully guide
the drone through tight race courses reaching speeds up to 8m/s and ranked
second at the 2019 AlphaPilot Challenge.Comment: Accepted at Robotics: Science and Systems 2020, associated video at
https://youtu.be/DGjwm5PZQT