955 research outputs found
Matching neural paths: transfer from recognition to correspondence search
Many machine learning tasks require finding per-part correspondences between
objects. In this work we focus on low-level correspondences - a highly
ambiguous matching problem. We propose to use a hierarchical semantic
representation of the objects, coming from a convolutional neural network, to
solve this ambiguity. Training it for low-level correspondence prediction
directly might not be an option in some domains where the ground-truth
correspondences are hard to obtain. We show how transfer from recognition can
be used to avoid such training. Our idea is to mark parts as "matching" if
their features are close to each other at all the levels of convolutional
feature hierarchy (neural paths). Although the overall number of such paths is
exponential in the number of layers, we propose a polynomial algorithm for
aggregating all of them in a single backward pass. The empirical validation is
done on the task of stereo correspondence and demonstrates that we achieve
competitive results among the methods which do not use labeled target domain
data.Comment: Accepted at NIPS 201
Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities
Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud
representation of the scene that does not model the topology of the
environment. A 3D mesh instead offers a richer, yet lightweight, model.
Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks
triangulated by a VIO algorithm often results in a mesh that does not fit the
real scene. In order to regularize the mesh, previous approaches decouple state
estimation from the 3D mesh regularization step, and either limit the 3D mesh
to the current frame or let the mesh grow indefinitely. We propose instead to
tightly couple mesh regularization and state estimation by detecting and
enforcing structural regularities in a novel factor-graph formulation. We also
propose to incrementally build the mesh by restricting its extent to the
time-horizon of the VIO optimization; the resulting 3D mesh covers a larger
portion of the scene than a per-frame approach while its memory usage and
computational complexity remain bounded. We show that our approach successfully
regularizes the mesh, while improving localization accuracy, when structural
regularities are present, and remains operational in scenes without
regularities.Comment: 7 pages, 5 figures, ICRA accepte
SurfelMeshing: Online Surfel-Based Mesh Reconstruction
We address the problem of mesh reconstruction from live RGB-D video, assuming
a calibrated camera and poses provided externally (e.g., by a SLAM system). In
contrast to most existing approaches, we do not fuse depth measurements in a
volume but in a dense surfel cloud. We asynchronously (re)triangulate the
smoothed surfels to reconstruct a surface mesh. This novel approach enables to
maintain a dense surface representation of the scene during SLAM which can
quickly adapt to loop closures. This is possible by deforming the surfel cloud
and asynchronously remeshing the surface where necessary. The surfel-based
representation also naturally supports strongly varying scan resolution. In
particular, it reconstructs colors at the input camera's resolution. Moreover,
in contrast to many volumetric approaches, ours can reconstruct thin objects
since objects do not need to enclose a volume. We demonstrate our approach in a
number of experiments, showing that it produces reconstructions that are
competitive with the state-of-the-art, and we discuss its advantages and
limitations. The algorithm (excluding loop closure functionality) is available
as open source at https://github.com/puzzlepaint/surfelmeshing .Comment: Version accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Robust Dense Mapping for Large-Scale Dynamic Environments
We present a stereo-based dense mapping algorithm for large-scale dynamic
urban environments. In contrast to other existing methods, we simultaneously
reconstruct the static background, the moving objects, and the potentially
moving but currently stationary objects separately, which is desirable for
high-level mobile robotic tasks such as path planning in crowded environments.
We use both instance-aware semantic segmentation and sparse scene flow to
classify objects as either background, moving, or potentially moving, thereby
ensuring that the system is able to model objects with the potential to
transition from static to dynamic, such as parked cars. Given camera poses
estimated from visual odometry, both the background and the (potentially)
moving objects are reconstructed separately by fusing the depth maps computed
from the stereo input. In addition to visual odometry, sparse scene flow is
also used to estimate the 3D motions of the detected moving objects, in order
to reconstruct them accurately. A map pruning technique is further developed to
improve reconstruction accuracy and reduce memory consumption, leading to
increased scalability. We evaluate our system thoroughly on the well-known
KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz,
with the primary bottleneck being the instance-aware semantic segmentation,
which is a limitation we hope to address in future work. The source code is
available from the project website (http://andreibarsan.github.io/dynslam).Comment: Presented at IEEE International Conference on Robotics and Automation
(ICRA), 201
Efficient 2D-3D Matching for Multi-Camera Visual Localization
Visual localization, i.e., determining the position and orientation of a
vehicle with respect to a map, is a key problem in autonomous driving. We
present a multicamera visual inertial localization algorithm for large scale
environments. To efficiently and effectively match features against a pre-built
global 3D map, we propose a prioritized feature matching scheme for
multi-camera systems. In contrast to existing works, designed for monocular
cameras, we (1) tailor the prioritization function to the multi-camera setup
and (2) run feature matching and pose estimation in parallel. This
significantly accelerates the matching and pose estimation stages and allows us
to dynamically adapt the matching efforts based on the surrounding environment.
In addition, we show how pose priors can be integrated into the localization
system to increase efficiency and robustness. Finally, we extend our algorithm
by fusing the absolute pose estimates with motion estimates from a multi-camera
visual inertial odometry pipeline (VIO). This results in a system that provides
reliable and drift-less pose estimation. Extensive experiments show that our
localization runs fast and robust under varying conditions, and that our
extended algorithm enables reliable real-time pose estimation.Comment: 7 pages, 5 figure
Multilinear Factorizations for Multi-Camera Rigid Structure from Motion Problems
Camera networks have gained increased importance in recent years. Existing approaches mostly use point correspondences between different camera views to calibrate such systems. However, it is often difficult or even impossible to establish such correspondences. But even without feature point correspondences between different camera views, if the cameras are temporally synchronized then the data from the cameras are strongly linked together by the motion correspondence: all the cameras observe the same motion. The present article therefore develops the necessary theory to use this motion correspondence for general rigid as well as planar rigid motions. Given multiple static affine cameras which observe a rigidly moving object and track feature points located on this object, what can be said about the resulting point trajectories? Are there any useful algebraic constraints hidden in the data? Is a 3D reconstruction of the scene possible even if there are no point correspondences between the different cameras? And if so, how many points are sufficient? Is there an algorithm which warrants finding the correct solution to this highly non-convex problem? This article addresses these questions and thereby introduces the concept of low-dimensional motion subspaces. The constraints provided by these motion subspaces enable an algorithm which ensures finding the correct solution to this non-convex reconstruction problem. The algorithm is based on multilinear analysis, matrix and tensor factorizations. Our new approach can handle extreme configurations, e.g. a camera in a camera network tracking only one single point. Results on synthetic as well as on real data sequences act as a proof of concept for the presented insight
- …
