23,247 research outputs found
Optical Flow Estimation in Ultrasound Images Using a Sparse Representation
This paper introduces a 2D optical flow estimation method for cardiac ultrasound imaging based on a sparse representation. The optical flow problem is regularized using a classical gradient-based smoothness term combined with a sparsity inducing regularization that uses a learned cardiac flow dictionary. A particular emphasis is put on the influence of the spatial and sparse regularizations on the optical flow estimation problem. A comparison with state-of-the-art methods using realistic simulations shows the competitiveness of the proposed method for cardiac motion estimation in ultrasound images
Event-based Temporally Dense Optical Flow Estimation with Sequential Neural Networks
Prior works on event-based optical flow estimation have investigated several
gradient-based learning methods to train neural networks for predicting optical
flow. However, they do not utilize the fast data rate of event data streams and
rely on a spatio-temporal representation constructed from a collection of
events over a fixed period of time (often between two grayscale frames). As a
result, optical flow is only evaluated at a frequency much lower than the rate
data is produced by an event-based camera, leading to a temporally sparse
optical flow estimation. To predict temporally dense optical flow, we cast the
problem as a sequential learning task and propose a training methodology to
train sequential networks for continuous prediction on an event stream. We
propose two types of networks: one focused on performance and another focused
on compute efficiency. We first train long-short term memory networks (LSTMs)
on the DSEC dataset and demonstrated 10x temporally dense optical flow
estimation over existing flow estimation approaches. The additional benefit of
having a memory to draw long temporal correlations back in time results in a
19.7% improvement in flow prediction accuracy of LSTMs over similar networks
with no memory elements. We subsequently show that the inherent recurrence of
spiking neural networks (SNNs) enables them to learn and estimate temporally
dense optical flow with 31.8% lesser parameters than LSTM, but with a slightly
increased error. This demonstrates potential for energy-efficient
implementation of fast optical flow prediction using SNNs.Comment: There are 16 pages, 5 figures and 2 tables in tota
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
An Efficient Algorithm for Video Super-Resolution Based On a Sequential Model
In this work, we propose a novel procedure for video super-resolution, that
is the recovery of a sequence of high-resolution images from its low-resolution
counterpart. Our approach is based on a "sequential" model (i.e., each
high-resolution frame is supposed to be a displaced version of the preceding
one) and considers the use of sparsity-enforcing priors. Both the recovery of
the high-resolution images and the motion fields relating them is tackled. This
leads to a large-dimensional, non-convex and non-smooth problem. We propose an
algorithmic framework to address the latter. Our approach relies on fast
gradient evaluation methods and modern optimization techniques for
non-differentiable/non-convex problems. Unlike some other previous works, we
show that there exists a provably-convergent method with a complexity linear in
the problem dimensions. We assess the proposed optimization method on {several
video benchmarks and emphasize its good performance with respect to the state
of the art.}Comment: 37 pages, SIAM Journal on Imaging Sciences, 201
Unsupervised Learning of Edges
Data-driven approaches for edge detection have proven effective and achieve
top results on modern benchmarks. However, all current data-driven edge
detectors require manual supervision for training in the form of hand-labeled
region segments or object boundaries. Specifically, human annotators mark
semantically meaningful edges which are subsequently used for training. Is this
form of strong, high-level supervision actually necessary to learn to
accurately detect edges? In this work we present a simple yet effective
approach for training edge detectors without human supervision. To this end we
utilize motion, and more specifically, the only input to our method is noisy
semi-dense matches between frames. We begin with only a rudimentary knowledge
of edges (in the form of image gradients), and alternate between improving
motion estimation and edge detection in turn. Using a large corpus of video
data, we show that edge detectors trained using our unsupervised scheme
approach the performance of the same methods trained with full supervision
(within 3-5%). Finally, we show that when using a deep network for the edge
detector, our approach provides a novel pre-training scheme for object
detection.Comment: Camera ready version for CVPR 201
EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras
Event-based cameras have shown great promise in a variety of situations where
frame based cameras suffer, such as high speed motions and high dynamic range
scenes. However, developing algorithms for event measurements requires a new
class of hand crafted algorithms. Deep learning has shown great success in
providing model free solutions to many problems in the vision community, but
existing networks have been developed with frame based images in mind, and
there does not exist the wealth of labeled data for events as there does for
images for supervised training. To these points, we present EV-FlowNet, a novel
self-supervised deep learning pipeline for optical flow estimation for event
based cameras. In particular, we introduce an image based representation of a
given event stream, which is fed into a self-supervised neural network as the
sole input. The corresponding grayscale images captured from the same camera at
the same time as the events are then used as a supervisory signal to provide a
loss function at training time, given the estimated flow from the network. We
show that the resulting network is able to accurately predict optical flow from
events only in a variety of different scenes, with performance competitive to
image based networks. This method not only allows for accurate estimation of
dense optical flow, but also provides a framework for the transfer of other
self-supervised methods to the event-based domain.Comment: 9 pages, 5 figures, 1 table. Accompanying video:
https://youtu.be/eMHZBSoq0sE. Dataset:
https://daniilidis-group.github.io/mvsec/, Robotics: Science and Systems 201
DeMoN: Depth and Motion Network for Learning Monocular Stereo
In this paper we formulate structure from motion as a learning problem. We
train a convolutional network end-to-end to compute depth and camera motion
from successive, unconstrained image pairs. The architecture is composed of
multiple stacked encoder-decoder networks, the core part being an iterative
network that is able to improve its own predictions. The network estimates not
only depth and motion, but additionally surface normals, optical flow between
the images and confidence of the matching. A crucial component of the approach
is a training loss based on spatial relative differences. Compared to
traditional two-frame structure from motion methods, results are more accurate
and more robust. In contrast to the popular depth-from-single-image networks,
DeMoN learns the concept of matching and, thus, better generalizes to
structures not seen during training.Comment: Camera ready version for CVPR 2017. Supplementary material included.
Project page:
http://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet
- …