15,729 research outputs found
Calibration by correlation using metric embedding from non-metric similarities
This paper presents a new intrinsic calibration method that allows us to calibrate a generic single-view point camera just
by waving it around. From the video sequence obtained while the camera undergoes random motion, we compute the pairwise time
correlation of the luminance signal for a subset of the pixels. We show that, if the camera undergoes a random uniform motion, then
the pairwise correlation of any pixels pair is a function of the distance between the pixel directions on the visual sphere. This leads to
formalizing calibration as a problem of metric embedding from non-metric measurements: we want to find the disposition of pixels on
the visual sphere from similarities that are an unknown function of the distances. This problem is a generalization of multidimensional
scaling (MDS) that has so far resisted a comprehensive observability analysis (can we reconstruct a metrically accurate embedding?)
and a solid generic solution (how to do so?). We show that the observability depends both on the local geometric properties (curvature)
as well as on the global topological properties (connectedness) of the target manifold. We show that, in contrast to the Euclidean case,
on the sphere we can recover the scale of the points distribution, therefore obtaining a metrically accurate solution from non-metric
measurements. We describe an algorithm that is robust across manifolds and can recover a metrically accurate solution when the metric
information is observable. We demonstrate the performance of the algorithm for several cameras (pin-hole, fish-eye, omnidirectional),
and we obtain results comparable to calibration using classical methods. Additional synthetic benchmarks show that the algorithm
performs as theoretically predicted for all corner cases of the observability analysis
Keyframe-based visualâinertial odometry using nonlinear optimization
Combining visual and inertial measurements has become popular in mobile robotics, since the two sensing modalities offer complementary characteristics that make them the ideal choice for accurate visualâinertial odometry or simultaneous localization and mapping (SLAM). While historically the problem has been addressed with filtering, advancements in visual estimation suggest that nonlinear optimization offers superior accuracy, while still tractable in complexity thanks to the sparsity of the underlying problem. Taking inspiration from these findings, we formulate a rigorously probabilistic cost function that combines reprojection errors of landmarks and inertial terms. The problem is kept tractable and thus ensuring real-time operation by limiting the optimization to a bounded window of keyframes through marginalization. Keyframes may be spaced in time by arbitrary intervals, while still related by linearized inertial terms. We present evaluation results on complementary datasets recorded with our custom-built stereo visualâinertial hardware that accurately synchronizes accelerometer and gyroscope measurements with imagery. A comparison of both a stereo and monocular version of our algorithm with and without online extrinsics estimation is shown with respect to ground truth. Furthermore, we compare the performance to an implementation of a state-of-the-art stochastic cloning sliding-window filter. This competitive reference implementation performs tightly coupled filtering-based visualâinertial odometry. While our approach declaredly demands more computation, we show its superior performance in terms of accuracy
Towards Visual Ego-motion Learning in Robots
Many model-based Visual Odometry (VO) algorithms have been proposed in the
past decade, often restricted to the type of camera optics, or the underlying
motion manifold observed. We envision robots to be able to learn and perform
these tasks, in a minimally supervised setting, as they gain more experience.
To this end, we propose a fully trainable solution to visual ego-motion
estimation for varied camera optics. We propose a visual ego-motion learning
architecture that maps observed optical flow vectors to an ego-motion density
estimate via a Mixture Density Network (MDN). By modeling the architecture as a
Conditional Variational Autoencoder (C-VAE), our model is able to provide
introspective reasoning and prediction for ego-motion induced scene-flow.
Additionally, our proposed model is especially amenable to bootstrapped
ego-motion learning in robots where the supervision in ego-motion estimation
for a particular camera sensor can be obtained from standard navigation-based
sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through
experiments, we show the utility of our proposed approach in enabling the
concept of self-supervised learning for visual ego-motion estimation in
autonomous robots.Comment: Conference paper; Submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures,
2 table
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Optical Flow in Mostly Rigid Scenes
The optical flow of natural scenes is a combination of the motion of the
observer and the independent motion of objects. Existing algorithms typically
focus on either recovering motion and structure under the assumption of a
purely static world or optical flow for general unconstrained scenes. We
combine these approaches in an optical flow algorithm that estimates an
explicit segmentation of moving objects from appearance and physical
constraints. In static regions we take advantage of strong constraints to
jointly estimate the camera motion and the 3D structure of the scene over
multiple frames. This allows us to also regularize the structure instead of the
motion. Our formulation uses a Plane+Parallax framework, which works even under
small baselines, and reduces the motion estimation to a one-dimensional search
problem, resulting in more accurate estimation. In moving regions the flow is
treated as unconstrained, and computed with an existing optical flow method.
The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art
results on both the MPI-Sintel and KITTI-2015 benchmarks.Comment: 15 pages, 10 figures; accepted for publication at CVPR 201
- âŠ