39 research outputs found
How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change
Direct visual localization has recently enjoyed a resurgence in popularity
with the increasing availability of cheap mobile computing power. The
competitive accuracy and robustness of these algorithms compared to
state-of-the-art feature-based methods, as well as their natural ability to
yield dense maps, makes them an appealing choice for a variety of mobile
robotics applications. However, direct methods remain brittle in the face of
appearance change due to their underlying assumption of photometric
consistency, which is commonly violated in practice. In this paper, we propose
to mitigate this problem by training deep convolutional encoder-decoder models
to transform images of a scene such that they correspond to a previously-seen
canonical appearance. We validate our method in multiple environments and
illumination conditions using high-fidelity synthetic RGB-D datasets, and
integrate the trained models into a direct visual localization pipeline,
yielding improvements in visual odometry (VO) accuracy through time-varying
illumination conditions, as well as improved metric relocalization performance
under illumination change, where conventional methods normally fail. We further
provide a preliminary investigation of transfer learning from synthetic to real
environments in a localization context. An open-source implementation of our
method using PyTorch is available at https://github.com/utiasSTARS/cat-net.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane,
Australia, May 21-25, 201
Unsupervised Learning of Depth and Ego-Motion from Video
We present an unsupervised learning framework for the task of monocular depth
and camera motion estimation from unstructured video sequences. We achieve this
by simultaneously training depth and camera pose estimation networks using the
task of view synthesis as the supervisory signal. The networks are thus coupled
via the view synthesis objective during training, but can be applied
independently at test time. Empirical evaluation on the KITTI dataset
demonstrates the effectiveness of our approach: 1) monocular depth performing
comparably with supervised methods that use either ground-truth pose or depth
for training, and 2) pose estimation performing favorably with established SLAM
systems under comparable input settings.Comment: Accepted to CVPR 2017. Project webpage:
https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner
Parsimonious Labeling
We propose a new family of discrete energy minimization problems, which we
call parsimonious labeling. Specifically, our energy functional consists of
unary potentials and high-order clique potentials. While the unary potentials
are arbitrary, the clique potentials are proportional to the {\em diversity} of
set of the unique labels assigned to the clique. Intuitively, our energy
functional encourages the labeling to be parsimonious, that is, use as few
labels as possible. This in turn allows us to capture useful cues for important
computer vision applications such as stereo correspondence and image denoising.
Furthermore, we propose an efficient graph-cuts based algorithm for the
parsimonious labeling problem that provides strong theoretical guarantees on
the quality of the solution. Our algorithm consists of three steps. First, we
approximate a given diversity using a mixture of a novel hierarchical
Potts model. Second, we use a divide-and-conquer approach for each mixture
component, where each subproblem is solved using an effficient
-expansion algorithm. This provides us with a small number of putative
labelings, one for each mixture component. Third, we choose the best putative
labeling in terms of the energy value. Using both sythetic and standard real
datasets, we show that our algorithm significantly outperforms other graph-cuts
based approaches
Shapecollage: Occlusion-Aware, Example-Based Shape Interpretation
This paper presents an example-based method to interpret a 3D shape from a single image depicting that shape. A major difficulty in applying an example-based approach to shape interpretation is the combinatorial explosion of shape possibilities that occur at occluding contours. Our key technical contribution is a new shape patch representation and corresponding pairwise compatibility terms that allow for flexible matching of overlapping patches, avoiding the combinatorial explosion by allowing patches to explain only the parts of the image they best fit. We infer the best set of localized shape patches over a graph of keypoints at multiple scales to produce a discontinuous shape representation we term a shape collage. To reconstruct a smooth result, we fit a surface to the collage using the predicted confidence of each shape patch. We demonstrate the method on shapes depicted in line drawing, diffuse and glossy shading, and textured styles.National Science Foundation (U.S.) (Grant 1111415)United States. Office of Naval Research (Grant N00014-09-1-1051)National Institutes of Health (U.S.) (Grant R01-EY019262
Fast View Synthesis with Deep Stereo Vision
Novel view synthesis is an important problem in computer vision and graphics.
Over the years a large number of solutions have been put forward to solve the
problem. However, the large-baseline novel view synthesis problem is far from
being "solved". Recent works have attempted to use Convolutional Neural
Networks (CNNs) to solve view synthesis tasks. Due to the difficulty of
learning scene geometry and interpreting camera motion, CNNs are often unable
to generate realistic novel views. In this paper, we present a novel view
synthesis approach based on stereo-vision and CNNs that decomposes the problem
into two sub-tasks: view dependent geometry estimation and texture inpainting.
Both tasks are structured prediction problems that could be effectively learned
with CNNs. Experiments on the KITTI Odometry dataset show that our approach is
more accurate and significantly faster than the current state-of-the-art. The
code and supplementary material will be publicly available. Results could be
found here https://youtu.be/5pzS9jc-5t
FĂ©nysugár-rekonstrukciĂł tetszĹ‘leges nĂ©zĹ‘pont kialakĂtásához = Ray Reconstruction to Make Arbitrary Viewpoint
Ebben a cikkben bemutatunk egy eljárást, amellyel egy kalibrált kamera rendszerben a kamerák által megfigyelt szĂntĂ©rrĹ‘l tetszĹ‘leges nĂ©zĹ‘pontbĂłl kĂ©szĂthetĂĽnk nĂ©zeti kĂ©pet. A cikk bemutat egy szintetikus kĂ©pi-adathalmaz kĂ©szĂtĂ©sĂ©re alkalmas mĂłdszert is. Ezzel az adathalmazzal fogjuk tesztelni a tetszĹ‘leges nĂ©zĹ‘pont kialakĂtására kĂ©szĂtett eljárásunkat