76 research outputs found
Performance Evaluation of Vision-Based Algorithms for MAVs
An important focus of current research in the field of Micro Aerial Vehicles
(MAVs) is to increase the safety of their operation in general unstructured
environments. Especially indoors, where GPS cannot be used for localization,
reliable algorithms for localization and mapping of the environment are
necessary in order to keep an MAV airborne safely. In this paper, we compare
vision-based real-time capable methods for localization and mapping and point
out their strengths and weaknesses. Additionally, we describe algorithms for
state estimation, control and navigation, which use the localization and
mapping results of our vision-based algorithms as input.Comment: Presented at OAGM Workshop, 2015 (arXiv:1505.01065
Relative Pose Estimation for Multi-Camera Systems from Affine Correspondences
We propose four novel solvers for estimating the relative pose of a
multi-camera system from affine correspondences (ACs). A new constraint is
derived interpreting the relationship of ACs and the generalized camera model.
Using the constraint, it is shown that a minimum of two ACs are enough for
recovering the 6DOF relative pose, i.e., 3D rotation and translation, of the
system. Considering planar camera motion, we propose a minimal solution using a
single AC and a solver with two ACs to overcome the degenerate case. Also, we
propose a minimal solution using two ACs with known gravity vector, e.g., from
an IMU. Since the proposed methods require significantly fewer correspondences
than state-of-the-art algorithms, they can be efficiently used within RANSAC
for outlier removal and initial motion estimation. The solvers are tested both
on synthetic data and on real-world scenes from the KITTI benchmark. It is
shown that the accuracy of the estimated poses is superior to the
state-of-the-art techniques
Learning and Matching Multi-View Descriptors for Registration of Point Clouds
Critical to the registration of point clouds is the establishment of a set of
accurate correspondences between points in 3D space. The correspondence problem
is generally addressed by the design of discriminative 3D local descriptors on
the one hand, and the development of robust matching strategies on the other
hand. In this work, we first propose a multi-view local descriptor, which is
learned from the images of multiple views, for the description of 3D keypoints.
Then, we develop a robust matching approach, aiming at rejecting outlier
matches based on the efficient inference via belief propagation on the defined
graphical model. We have demonstrated the boost of our approaches to
registration on the public scanning and multi-view stereo datasets. The
superior performance has been verified by the intensive comparisons against a
variety of descriptors and matching methods
Purposive sample consensus: A paradigm for model fitting with application to visual odometry
© Springer International Publishing Switzerland 2015. ANSAC (random sample consensus) is a robust algorithm for model fitting and outliers' removal, however, it is neither efficient nor reliable enough to meet the requirement of many applications where time and precision is critical. Various algorithms have been developed to improve its performance for model fitting. A new algorithm named PURSAC (purposive sample consensus) is introduced in this paper, which has three major steps to address the limitations of RANSAC and its variants. Firstly, instead of assuming all the samples have a same probability to be inliers, PURSAC seeks their differences and purposively selects sample sets. Secondly, as sampling noise always exists; the selection is also according to the sensitivity analysis of a model against the noise. The final step is to apply a local optimization for further improving its model fitting performance. Tests show that PURSAC can achieve very high model fitting certainty with a small number of iterations. Two cases are investigated for PURSAC implementation. It is applied to line fitting to explain its principles, and then to feature based visual odometry, which requires efficient, robust and precise model fitting. Experimental results demonstrate that PURSAC improves the accuracy and efficiency of fundamental matrix estimation dramatically, resulting in a precise and fast visual odometry
Recommended from our members
Real-time smart and standalone vision/IMU navigation sensor
In this paper, we present a smart, standalone, multi-platform stereo vision/IMU-based navigation system, providing ego-motion estimation. The real-time visual odometry algorithm is run on a nano ITX single-board computer (SBC) of 1.9 GHz CPU and 16-core GPU. High-resolution stereo images of 1.2 megapixel provide high-quality data. Tracking of up to 750 features is made possible at 5 fps thanks to a minimal, but efficient, features detection–stereo matching–feature tracking scheme runs on the GPU. Furthermore, the feature tracking algorithm benefits from assistance of a 100 Hz IMU whose accelerometer and gyroscope data provide inertial features prediction enhancing execution speed and tracking efficiency. In a space mission context, we demonstrate robustness and accuracy of the real-time generated 6-degrees-of-freedom trajectories from our visual odometry algorithm. Performance evaluations are comparable to ground truth measurements from an external motion capture system
Relative Pose from Deep Learned Depth and a Single Affine Correspondence
We propose a new approach for combining deep-learned non-metric monocular
depth with affine correspondences (ACs) to estimate the relative pose of two
calibrated cameras from a single correspondence. Considering the depth
information and affine features, two new constraints on the camera pose are
derived. The proposed solver is usable within 1-point RANSAC approaches. Thus,
the processing time of the robust estimation is linear in the number of
correspondences and, therefore, orders of magnitude faster than by using
traditional approaches. The proposed 1AC+D solver is tested both on synthetic
data and on 110395 publicly available real image pairs where we used an
off-the-shelf monocular depth network to provide up-to-scale depth per pixel.
The proposed 1AC+D leads to similar accuracy as traditional approaches while
being significantly faster. When solving large-scale problems, e.g., pose-graph
initialization for Structure-from-Motion (SfM) pipelines, the overhead of
obtaining ACs and monocular depth is negligible compared to the speed-up gained
in the pairwise geometric verification, i.e., relative pose estimation. This is
demonstrated on scenes from the 1DSfM dataset using a state-of-the-art global
SfM algorithm. Source code: https://github.com/eivan/one-ac-pos
Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's Attention
In the transition from industrial to service robotics, robots will have to deal with increasingly unpredictable and variable environments. We present a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions. The method can recognize objects from arbitrary viewpoints and generalizes to instances that have never been observed during training, even if they are partially occluded and appear against cluttered backgrounds. Our approach builds on the implicit shape model of Leibe et al. We extend it to couple recognition to the provision of meta-dat
Using the properties of Primate Motion Sensitive Neurons to extract camera motion and depth from brief 2-D Monocular Image Sequences
Humans and most animals can run/fly and navigate efficiently through cluttered environments while avoiding obstacles in their way. Replicating this advanced skill in autonomous robotic vehicles currently requires a vast array of sensors coupled with computers that are bulky, heavy and power hungry. The human eye and brain have had millions of years to develop an efficient solution to the problem of visual navigation and we believe that it is the best system to reverse engineer. Our brain and visual system appear to use a very different solution to the visual odometry problem compared to most computer vision approaches. We show how a neural-based architecture is able to extract self-motion information and depth from monocular 2-D video sequences and highlight how this approach differs from standard CV techniques. We previously demonstrated how our system works during pure translation of a camera. Here, we extend this approach to the case of combined translation and rotation
Correspondência eficiente de descritores SIFT para construção de mapas densos de pontos homólogos em imagens de sensoriamento remoto
Métodos automáticos de localização de pontos homólogos em imagens digitais baseados em área, combinados com técnicas de crescimento de região, são capazes de produzir uma malha densa e exata de pontos homólogos. Entretanto, o processo de crescimento de região pode ser interrompido em regiões da imagem, cuja paralaxe no eixo horizontal apresenta variação abrupta. Essa situação geralmente é causada por uma descontinuidade na superfície ou espaço-objeto imageado, tal como um prédio numa cena urbana ou um paredão de exploração de uma mina a céu aberto. Nesses casos, novos pares de pontos homólogos (sementes) devem ser introduzidos, normalmente por um operador humano, a partir dos quais o processo é reiniciado. Dependendo do tipo da imagem utilizada e da estrutura 3D da região mapeada, a intervenção humana pode ser considerável. Uma alternativa totalmente automatizada em que se combinam as técnicas SIFT (Scale Invariant Feature Transform), pareamento por mínimos quadrados e crescimento de região foi proposta anteriormente pelos autores. O presente trabalho apresenta uma extensão a essa técnica. Basicamente, propõem-se alterações na etapa de correspondência do SIFT, que exploram características de estereogramas produzidos por sensores aéreos e orbitais. Avaliações experimentais demonstram que as modificações propostas trazem dois tipos de benefícios. Em primeiro lugar, obtém-se um aumento do número de pontos homólogos encontrados, sem aumento correspondente na proporção de falsas correspondências. Em segundo lugar, a carga computacional é reduzida substancialmente
- …