5,093 research outputs found
Fine-To-Coarse Global Registration of RGB-D Scans
RGB-D scanning of indoor environments is important for many applications,
including real estate, interior design, and virtual reality. However, it is
still challenging to register RGB-D images from a hand-held camera over a long
video sequence into a globally consistent 3D model. Current methods often can
lose tracking or drift and thus fail to reconstruct salient structures in large
environments (e.g., parallel walls in different rooms). To address this
problem, we propose a "fine-to-coarse" global registration algorithm that
leverages robust registrations at finer scales to seed detection and
enforcement of new correspondence and structural constraints at coarser scales.
To test global registration algorithms, we provide a benchmark with 10,401
manually-clicked point correspondences in 25 scenes from the SUN3D dataset.
During experiments with this benchmark, we find that our fine-to-coarse
algorithm registers long RGB-D sequences better than previous methods
Proposal Flow
Finding image correspondences remains a challenging problem in the presence
of intra-class variations and large changes in scene layout.~Semantic flow
methods are designed to handle images depicting different instances of the same
object or scene category. We introduce a novel approach to semantic flow,
dubbed proposal flow, that establishes reliable correspondences using object
proposals. Unlike prevailing semantic flow approaches that operate on pixels or
regularly sampled local regions, proposal flow benefits from the
characteristics of modern object proposals, that exhibit high repeatability at
multiple scales, and can take advantage of both local and geometric consistency
constraints among proposals. We also show that proposal flow can effectively be
transformed into a conventional dense flow field. We introduce a new dataset
that can be used to evaluate both general semantic flow techniques and
region-based approaches such as proposal flow. We use this benchmark to compare
different matching algorithms, object proposals, and region features within
proposal flow, to the state of the art in semantic flow. This comparison, along
with experiments on standard datasets, demonstrates that proposal flow
significantly outperforms existing semantic flow methods in various settings
Proposal Flow: Semantic Correspondences from Object Proposals
Finding image correspondences remains a challenging problem in the presence
of intra-class variations and large changes in scene layout. Semantic flow
methods are designed to handle images depicting different instances of the same
object or scene category. We introduce a novel approach to semantic flow,
dubbed proposal flow, that establishes reliable correspondences using object
proposals. Unlike prevailing semantic flow approaches that operate on pixels or
regularly sampled local regions, proposal flow benefits from the
characteristics of modern object proposals, that exhibit high repeatability at
multiple scales, and can take advantage of both local and geometric consistency
constraints among proposals. We also show that the corresponding sparse
proposal flow can effectively be transformed into a conventional dense flow
field. We introduce two new challenging datasets that can be used to evaluate
both general semantic flow techniques and region-based approaches such as
proposal flow. We use these benchmarks to compare different matching
algorithms, object proposals, and region features within proposal flow, to the
state of the art in semantic flow. This comparison, along with experiments on
standard datasets, demonstrates that proposal flow significantly outperforms
existing semantic flow methods in various settings.Comment: arXiv admin note: text overlap with arXiv:1511.0506
SceneFlowFields: Dense Interpolation of Sparse Scene Flow Correspondences
While most scene flow methods use either variational optimization or a strong
rigid motion assumption, we show for the first time that scene flow can also be
estimated by dense interpolation of sparse matches. To this end, we find sparse
matches across two stereo image pairs that are detected without any prior
regularization and perform dense interpolation preserving geometric and motion
boundaries by using edge information. A few iterations of variational energy
minimization are performed to refine our results, which are thoroughly
evaluated on the KITTI benchmark and additionally compared to state-of-the-art
on MPI Sintel. For application in an automotive context, we further show that
an optional ego-motion model helps to boost performance and blends smoothly
into our approach to produce a segmentation of the scene into static and
dynamic parts.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
201
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph
with respect to a large indoor 3D map. The contributions of this work are
three-fold. First, we develop a new large-scale visual localization method
targeted for indoor environments. The method proceeds along three steps: (i)
efficient retrieval of candidate poses that ensures scalability to large-scale
environments, (ii) pose estimation using dense matching rather than local
features to deal with textureless indoor scenes, and (iii) pose verification by
virtual view synthesis to cope with significant changes in viewpoint, scene
layout, and occluders. Second, we collect a new dataset with reference 6DoF
poses for large-scale indoor localization. Query photographs are captured by
mobile phones at a different time than the reference 3D map, thus presenting a
realistic indoor localization scenario. Third, we demonstrate that our method
significantly outperforms current state-of-the-art indoor localization
approaches on this new challenging data
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
In this work we address the problem of finding reliable pixel-level
correspondences under difficult imaging conditions. We propose an approach
where a single convolutional neural network plays a dual role: It is
simultaneously a dense feature descriptor and a feature detector. By postponing
the detection to a later stage, the obtained keypoints are more stable than
their traditional counterparts based on early detection of low-level
structures. We show that this model can be trained using pixel correspondences
extracted from readily available large-scale SfM reconstructions, without any
further annotations. The proposed method obtains state-of-the-art performance
on both the difficult Aachen Day-Night localization dataset and the InLoc
indoor localization benchmark, as well as competitive performance on other
benchmarks for image matching and 3D reconstruction.Comment: Accepted at CVPR 201
SuperPoint: Self-Supervised Interest Point Detection and Description
This paper presents a self-supervised framework for training interest point
detectors and descriptors suitable for a large number of multiple-view geometry
problems in computer vision. As opposed to patch-based neural networks, our
fully-convolutional model operates on full-sized images and jointly computes
pixel-level interest point locations and associated descriptors in one forward
pass. We introduce Homographic Adaptation, a multi-scale, multi-homography
approach for boosting interest point detection repeatability and performing
cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on
the MS-COCO generic image dataset using Homographic Adaptation, is able to
repeatedly detect a much richer set of interest points than the initial
pre-adapted deep model and any other traditional corner detector. The final
system gives rise to state-of-the-art homography estimation results on HPatches
when compared to LIFT, SIFT and ORB.Comment: Camera-ready version for CVPR 2018 Deep Learning for Visual SLAM
Workshop (DL4VSLAM2018
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching
Despite significant progress of deep learning in recent years,
state-of-the-art semantic matching methods still rely on legacy features such
as SIFT or HoG. We argue that the strong invariance properties that are key to
the success of recent deep architectures on the classification task make them
unfit for dense correspondence tasks, unless a large amount of supervision is
used. In this work, we propose a deep network, termed AnchorNet, that produces
image representations that are well-suited for semantic matching. It relies on
a set of filters whose response is geometrically consistent across different
object instances, even in the presence of strong intra-class, scale, or
viewpoint variations. Trained only with weak image-level labels, the final
representation successfully captures information about the object structure and
improves results of state-of-the-art semantic matching methods such as the
deformable spatial pyramid or the proposal flow methods. We show positive
results on the cross-instance matching task where different instances of the
same object category are matched as well as on a new cross-category semantic
matching task aligning pairs of instances each from a different object class.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 201
- …