17,899 research outputs found
Learning Rank Reduced Interpolation with Principal Component Analysis
In computer vision most iterative optimization algorithms, both sparse and
dense, rely on a coarse and reliable dense initialization to bootstrap their
optimization procedure. For example, dense optical flow algorithms profit
massively in speed and robustness if they are initialized well in the basin of
convergence of the used loss function. The same holds true for methods as
sparse feature tracking when initial flow or depth information for new features
at arbitrary positions is needed. This makes it extremely important to have
techniques at hand that allow to obtain from only very few available
measurements a dense but still approximative sketch of a desired 2D structure
(e.g. depth maps, optical flow, disparity maps, etc.). The 2D map is regarded
as sample from a 2D random process. The method presented here exploits the
complete information given by the principal component analysis (PCA) of that
process, the principal basis and its prior distribution. The method is able to
determine a dense reconstruction from sparse measurement. When facing
situations with only very sparse measurements, typically the number of
principal components is further reduced which results in a loss of
expressiveness of the basis. We overcome this problem and inject prior
knowledge in a maximum a posterior (MAP) approach. We test our approach on the
KITTI and the virtual KITTI datasets and focus on the interpolation of depth
maps for driving scenes. The evaluation of the results show good agreement to
the ground truth and are clearly better than results of interpolation by the
nearest neighbor method which disregards statistical information.Comment: Accepted at Intelligent Vehicles Symposium (IV), Los Angeles, USA,
June 201
Independent Motion Detection with Event-driven Cameras
Unlike standard cameras that send intensity images at a constant frame rate,
event-driven cameras asynchronously report pixel-level brightness changes,
offering low latency and high temporal resolution (both in the order of
micro-seconds). As such, they have great potential for fast and low power
vision algorithms for robots. Visual tracking, for example, is easily achieved
even for very fast stimuli, as only moving objects cause brightness changes.
However, cameras mounted on a moving robot are typically non-stationary and the
same tracking problem becomes confounded by background clutter events due to
the robot ego-motion. In this paper, we propose a method for segmenting the
motion of an independently moving object for event-driven cameras. Our method
detects and tracks corners in the event stream and learns the statistics of
their motion as a function of the robot's joint velocities when no
independently moving objects are present. During robot operation, independently
moving objects are identified by discrepancies between the predicted corner
velocities from ego-motion and the measured corner velocities. We validate the
algorithm on data collected from the neuromorphic iCub robot. We achieve a
precision of ~ 90 % and show that the method is robust to changes in speed of
both the head and the target.Comment: 7 pages, 6 figure
Self-Supervised Relative Depth Learning for Urban Scene Understanding
As an agent moves through the world, the apparent motion of scene elements is
(usually) inversely proportional to their depth. It is natural for a learning
agent to associate image patterns with the magnitude of their displacement over
time: as the agent moves, faraway mountains don't move much; nearby trees move
a lot. This natural relationship between the appearance of objects and their
motion is a rich source of information about the world. In this work, we start
by training a deep network, using fully automatic supervision, to predict
relative scene depth from single images. The relative depth training images are
automatically derived from simple videos of cars moving through a scene, using
recent motion segmentation techniques, and no human-provided labels. This proxy
task of predicting relative depth from a single image induces features in the
network that result in large improvements in a set of downstream tasks
including semantic segmentation, joint road segmentation and car detection, and
monocular (absolute) depth estimation, over a network trained from scratch. The
improvement on the semantic segmentation task is greater than those produced by
any other automatically supervised methods. Moreover, for monocular depth
estimation, our unsupervised pre-training method even outperforms supervised
pre-training with ImageNet. In addition, we demonstrate benefits from learning
to predict (unsupervised) relative depth in the specific videos associated with
various downstream tasks. We adapt to the specific scenes in those tasks in an
unsupervised manner to improve performance. In summary, for semantic
segmentation, we present state-of-the-art results among methods that do not use
supervised pre-training, and we even exceed the performance of supervised
ImageNet pre-trained models for monocular depth estimation, achieving results
that are comparable with state-of-the-art methods
DeMoN: Depth and Motion Network for Learning Monocular Stereo
In this paper we formulate structure from motion as a learning problem. We
train a convolutional network end-to-end to compute depth and camera motion
from successive, unconstrained image pairs. The architecture is composed of
multiple stacked encoder-decoder networks, the core part being an iterative
network that is able to improve its own predictions. The network estimates not
only depth and motion, but additionally surface normals, optical flow between
the images and confidence of the matching. A crucial component of the approach
is a training loss based on spatial relative differences. Compared to
traditional two-frame structure from motion methods, results are more accurate
and more robust. In contrast to the popular depth-from-single-image networks,
DeMoN learns the concept of matching and, thus, better generalizes to
structures not seen during training.Comment: Camera ready version for CVPR 2017. Supplementary material included.
Project page:
http://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
We present a compact but effective CNN model for optical flow, called
PWC-Net. PWC-Net has been designed according to simple and well-established
principles: pyramidal processing, warping, and the use of a cost volume. Cast
in a learnable feature pyramid, PWC-Net uses the cur- rent optical flow
estimate to warp the CNN features of the second image. It then uses the warped
features and features of the first image to construct a cost volume, which is
processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in
size and easier to train than the recent FlowNet2 model. Moreover, it
outperforms all published optical flow methods on the MPI Sintel final pass and
KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436)
images. Our models are available on https://github.com/NVlabs/PWC-Net.Comment: CVPR 2018 camera ready version (with github link to Caffe and PyTorch
code
Hallucinating dense optical flow from sparse lidar for autonomous vehicles
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we propose a novel approach to estimate dense optical flow from sparse lidar data acquired on an autonomous vehicle. This is intended to be used as a drop-in replacement of any image-based optical flow system when images are not reliable due to e.g. adverse weather conditions or at night. In order to infer high resolution 2D flows from discrete range data we devise a three-block architecture of multiscale filters that combines multiple intermediate objectives, both in the lidar and image domain. To train this network we introduce a dataset with approximately 20K lidar samples of the Kitti dataset which we have augmented with a pseudo ground-truth image-based optical flow computed using FlowNet2. We demonstrate the effectiveness of our approach on Kitti, and show that despite using the low-resolution and sparse measurements of the lidar, we can regress dense optical flow maps which are at par with those estimated with image-based methods.Peer ReviewedPostprint (author's final draft
- …