1,232 research outputs found
Large Scale SfM with the Distributed Camera Model
We introduce the distributed camera model, a novel model for
Structure-from-Motion (SfM). This model describes image observations in terms
of light rays with ray origins and directions rather than pixels. As such, the
proposed model is capable of describing a single camera or multiple cameras
simultaneously as the collection of all light rays observed. We show how the
distributed camera model is a generalization of the standard camera model and
describe a general formulation and solution to the absolute camera pose problem
that works for standard or distributed cameras. The proposed method computes a
solution that is up to 8 times more efficient and robust to rotation
singularities in comparison with gDLS. Finally, this method is used in an novel
large-scale incremental SfM pipeline where distributed cameras are accurately
and robustly merged together. This pipeline is a direct generalization of
traditional incremental SfM; however, instead of incrementally adding one
camera at a time to grow the reconstruction the reconstruction is grown by
adding a distributed camera. Our pipeline produces highly accurate
reconstructions efficiently by avoiding the need for many bundle adjustment
iterations and is capable of computing a 3D model of Rome from over 15,000
images in just 22 minutes.Comment: Published at 2016 3DV Conferenc
Relative localization for aerial manipulation with PL-SLAM
The final publication is available at link.springer.comThis chapter explains a precise SLAM technique, PL-SLAM, that allows to simultaneously process points and lines and tackle situations where point-only based methods are prone to fail, like poorly textured scenes or motion blurred images where feature points are vanished out. The method is remarkably robust against image noise, and that it outperforms state-of-the-art methods for point based contour alignment. The method can run in real-time and in a low cost hardware.Peer ReviewedPostprint (author's final draft
Geometry-Aware Learning of Maps for Camera Localization
Maps are a key component in image-based camera localization and visual SLAM
systems: they are used to establish geometric constraints between images,
correct drift in relative pose estimation, and relocalize cameras after lost
tracking. The exact definitions of maps, however, are often
application-specific and hand-crafted for different scenarios (e.g. 3D
landmarks, lines, planes, bags of visual words). We propose to represent maps
as a deep neural net called MapNet, which enables learning a data-driven map
representation. Unlike prior work on learning maps, MapNet exploits cheap and
ubiquitous sensory inputs like visual odometry and GPS in addition to images
and fuses them together for camera localization. Geometric constraints
expressed by these inputs, which have traditionally been used in bundle
adjustment or pose-graph optimization, are formulated as loss terms in MapNet
training and also used during inference. In addition to directly improving
localization accuracy, this allows us to update the MapNet (i.e., maps) in a
self-supervised manner using additional unlabeled video sequences from the
scene. We also propose a novel parameterization for camera rotation which is
better suited for deep-learning based camera pose regression. Experimental
results on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCar
dataset show significant performance improvement over prior work. The MapNet
project webpage is https://goo.gl/mRB3Au.Comment: CVPR 2018 camera ready paper + supplementary materia
LDSO: Direct Sparse Odometry with Loop Closure
In this paper we present an extension of Direct Sparse Odometry (DSO) to a
monocular visual SLAM system with loop closure detection and pose-graph
optimization (LDSO). As a direct technique, DSO can utilize any image pixel
with sufficient intensity gradient, which makes it robust even in featureless
areas. LDSO retains this robustness, while at the same time ensuring
repeatability of some of these points by favoring corner features in the
tracking frontend. This repeatability allows to reliably detect loop closure
candidates with a conventional feature-based bag-of-words (BoW) approach. Loop
closure candidates are verified geometrically and Sim(3) relative pose
constraints are estimated by jointly minimizing 2D and 3D geometric error
terms. These constraints are fused with a co-visibility graph of relative poses
extracted from DSO's sliding window optimization. Our evaluation on publicly
available datasets demonstrates that the modified point selection strategy
retains the tracking accuracy and robustness, and the integrated pose-graph
optimization significantly reduces the accumulated rotation-, translation- and
scale-drift, resulting in an overall performance comparable to state-of-the-art
feature-based systems, even without global bundle adjustment
Structure from Motion with Higher-level Environment Representations
Computer vision is an important area focusing on understanding,
extracting and using the information from vision-based sensor. It
has many applications such as vision-based 3D reconstruction,
simultaneous localization and mapping(SLAM) and data-driven
understanding of the real world. Vision is a fundamental sensing
modality in many different fields of application.
While the traditional structure from motion mostly uses sparse
point-based feature, this thesis aims to explore the possibility
of using higher order feature representation. It starts with a
joint work which uses straight line for feature representation
and performs bundle adjustment with straight line
parameterization. Then, we further try an even higher order
representation where we use Bezier spline for parameterization.
We start with a simple case where all contours are lying on the
plane and uses Bezier splines to parametrize the curves in the
background and optimize on both camera position and Bezier
splines. For application, we present a complete end-to-end
pipeline which produces meaningful dense 3D models from natural
data of a 3D object: the target object is placed on a structured
but unknown planar background that is modeled with splines. The
data is captured using only a hand-held monocular camera.
However, this application is limited to a planar scenario and we
manage to push the parameterizations into real 3D. Following the
potential of this idea, we introduce a more flexible higher-order
extension of points that provide a general model for structural
edges in the environment, no matter if straight or curved. Our
model relies on linked B´ezier curves, the geometric intuition
of which proves great benefits during parameter initialization
and regularization. We present the
first fully automatic pipeline that is able to generate
spline-based representations without any human supervision.
Besides a full graphical formulation of the problem, we introduce
both geometric and photometric cues as well as higher-level
concepts such overall curve visibility and viewing angle
restrictions to automatically manage the correspondences in the
graph. Results prove that curve-based structure from motion with
splines is able to outperform state-of-the-art sparse
feature-based methods, as well as to model curved edges in the
environment
Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC
We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm
for initializing pose graph optimization problems, arising in various scenarios
such as SFM (structure from motion) or SLAM (simultaneous localization and
mapping). TG-MCMC is first of its kind as it unites asymptotically global
non-convex optimization on the spherical manifold of quaternions with posterior
sampling, in order to provide both reliable initial poses and uncertainty
estimates that are informative about the quality of individual solutions. We
devise rigorous theoretical convergence guarantees for our method and
extensively evaluate it on synthetic and real benchmark datasets. Besides its
elegance in formulation and theory, we show that our method is robust to
missing data, noise and the estimated uncertainties capture intuitive
properties of the data.Comment: Published at NeurIPS 2018, 25 pages with supplement
- …