43,263 research outputs found
Learning Direct Optimization for scene understanding
We develop a Learning Direct Optimization (LiDO) method for the refinement of
a latent variable model that describes input image x. Our goal is to explain a
single image x with an interpretable 3D computer graphics model having scene
graph latent variables z (such as object appearance, camera position). Given a
current estimate of z we can render a prediction of the image g(z), which can
be compared to the image x. The standard way to proceed is then to measure the
error E(x, g(z)) between the two, and use an optimizer to minimize the error.
However, it is unknown which error measure E would be most effective for
simultaneously addressing issues such as misaligned objects, occlusions,
textures, etc. In contrast, the LiDO approach trains a Prediction Network to
predict an update directly to correct z, rather than minimizing the error with
respect to z. Experiments show that our LiDO method converges rapidly as it
does not need to perform a search on the error landscape, produces better
solutions than error-based competitors, and is able to handle the mismatch
between the data and the fitted scene model. We apply LiDO to a realistic
synthetic dataset, and show that the method also transfers to work well with
real images
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Intrinsic image decomposition is a challenging, long-standing computer vision
problem for which ground truth data is very difficult to acquire. We explore
the use of synthetic data for training CNN-based intrinsic image decomposition
models, then applying these learned models to real-world images. To that end,
we present \ICG, a new, large-scale dataset of physically-based rendered images
of scenes with full ground truth decompositions. The rendering process we use
is carefully designed to yield high-quality, realistic images, which we find to
be crucial for this problem domain. We also propose a new end-to-end training
method that learns better decompositions by leveraging \ICG, and optionally IIW
and SAW, two recent datasets of sparse annotations on real-world images.
Surprisingly, we find that a decomposition network trained solely on our
synthetic data outperforms the state-of-the-art on both IIW and SAW, and
performance improves even further when IIW and SAW data is added during
training. Our work demonstrates the suprising effectiveness of
carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through
Physically-Based Rendering' published in ECCV, 201
- …