3,363 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Focus Is All You Need: Loss Functions For Event-based Vision
Event cameras are novel vision sensors that output pixel-level brightness
changes ("events") instead of traditional video frames. These asynchronous
sensors offer several advantages over traditional cameras, such as, high
temporal resolution, very high dynamic range, and no motion blur. To unlock the
potential of such sensors, motion compensation methods have been recently
proposed. We present a collection and taxonomy of twenty two objective
functions to analyze event alignment in motion compensation approaches (Fig.
1). We call them Focus Loss Functions since they have strong connections with
functions used in traditional shape-from-focus applications. The proposed loss
functions allow bringing mature computer vision tools to the realm of event
cameras. We compare the accuracy and runtime performance of all loss functions
on a publicly available dataset, and conclude that the variance, the gradient
and the Laplacian magnitudes are among the best loss functions. The
applicability of the loss functions is shown on multiple tasks: rotational
motion, depth and optical flow estimation. The proposed focus loss functions
allow to unlock the outstanding properties of event cameras.Comment: 29 pages, 19 figures, 4 table
Towards Anytime Optical Flow Estimation with Event Cameras
Event cameras are capable of responding to log-brightness changes in
microseconds. Its characteristic of producing responses only to the changing
region is particularly suitable for optical flow estimation. In contrast to the
super low-latency response speed of event cameras, existing datasets collected
via event cameras, however, only provide limited frame rate optical flow ground
truth, (e.g., at 10Hz), greatly restricting the potential of event-driven
optical flow. To address this challenge, we put forward a high-frame-rate,
low-latency event representation Unified Voxel Grid, sequentially fed into the
network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow
estimation network to produce high-frame-rate event optical flow with only
low-frame-rate optical flow ground truth for supervision. The key component of
our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module,
which predicts temporally-dense optical flow and enhances the accuracy via
spatial-temporal motion refinement. The time-dense feature warping utilized in
the SMR module provides implicit supervision for the intermediate optical flow.
Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the
unsupervised evaluation of intermediate optical flow in the absence of ground
truth. This is, to the best of our knowledge, the first work focusing on
anytime optical flow estimation via event cameras. A comprehensive variety of
experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow
achieves competitive performance, super-low-latency (5ms), fastest inference
(9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our
code will be available at https://github.com/Yaozhuwa/EVA-Flow.Comment: Code will be available at https://github.com/Yaozhuwa/EVA-Flo
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Event-Based Motion Segmentation by Motion Compensation
In contrast to traditional cameras, whose pixels have a common exposure time,
event-based cameras are novel bio-inspired sensors whose pixels work
independently and asynchronously output intensity changes (called "events"),
with microsecond resolution. Since events are caused by the apparent motion of
objects, event-based cameras sample visual information based on the scene
dynamics and are, therefore, a more natural fit than traditional cameras to
acquire motion, especially at high speeds, where traditional cameras suffer
from motion blur. However, distinguishing between events caused by different
moving objects and by the camera's ego-motion is a challenging task. We present
the first per-event segmentation method for splitting a scene into
independently moving objects. Our method jointly estimates the event-object
associations (i.e., segmentation) and the motion parameters of the objects (or
the background) by maximization of an objective function, which builds upon
recent results on event-based motion-compensation. We provide a thorough
evaluation of our method on a public dataset, outperforming the
state-of-the-art by as much as 10%. We also show the first quantitative
evaluation of a segmentation algorithm for event cameras, yielding around 90%
accuracy at 4 pixels relative displacement.Comment: When viewed in Acrobat Reader, several of the figures animate. Video:
https://youtu.be/0q6ap_OSBA
Unsupervised Event-based Learning of Optical Flow, Depth, and Egomotion
In this work, we propose a novel framework for unsupervised learning for
event cameras that learns motion information from only the event stream. In
particular, we propose an input representation of the events in the form of a
discretized volume that maintains the temporal distribution of the events,
which we pass through a neural network to predict the motion of the events.
This motion is used to attempt to remove any motion blur in the event image. We
then propose a loss function applied to the motion compensated event image that
measures the motion blur in this image. We train two networks with this
framework, one to predict optical flow, and one to predict egomotion and
depths, and evaluate these networks on the Multi Vehicle Stereo Event Camera
dataset, along with qualitative results from a variety of different scenes.Comment: 9 pages, 7 figure
Ego-motion Estimation Based on Fusion of Images and Events
Event camera is a novel bio-inspired vision sensor that outputs event stream.
In this paper, we propose a novel data fusion algorithm called EAS to fuse
conventional intensity images with the event stream. The fusion result is
applied to some ego-motion estimation frameworks, and is evaluated on a public
dataset acquired in dim scenes. In our 3-DoF rotation estimation framework, EAS
achieves the highest estimation accuracy among intensity images and
representations of events including event slice, TS and SITS. Compared with
original images, EAS reduces the average APE by 69%, benefiting from the
inclusion of more features for tracking. The result shows that our algorithm
effectively leverages the high dynamic range of event cameras to improve the
performance of the ego-motion estimation framework based on optical flow
tracking in difficult illumination conditions
Event transformer FlowNet for optical flow estimation
Event cameras are bioinspired sensors that produce asynchronous and sparse streams of events at image locations where intensity change is detected. They can detect fast motion with low latency, high dynamic range, and low power consumption. Over the past decade, efforts have been conducted in developing solutions with event cameras for robotics applications. In this work, we address their use for fast and robust computation of optical flow. We present ET-FlowNet, a hybrid RNN-ViT architecture for optical flow estimation. Visual transformers (ViTs) are ideal candidates for the learning of global context in visual tasks, and we argue that rigid body motion is a prime case for the use of ViTs since long-range dependencies in the image hold during rigid body motion. We perform end-to-end training with self-supervised learning method. Our results show comparable and in some cases exceeding performance with state-of-the-art coarse-to-fine event-based optical flow estimation.This work was supported by projects EBSLAM DPI2017-89564-P and EBCON PID2020-119244GB-I00 funded by CIN/AEI/10.13039/501100011033 and by an FI AGAUR PhD grant to Yi Tian.Postprint (published version
- âŠ