12 research outputs found
Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition
Given a cell phone image of a building we address the problem of place-of-interest recognition in urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-level image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a purely homothetic problem, which we show enables more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-level image data and a challenging set of cell phone image
Find your Way by Observing the Sun and Other Semantic Cues
In this paper we present a robust, efficient and affordable approach to
self-localization which does not require neither GPS nor knowledge about the
appearance of the world. Towards this goal, we utilize freely available
cartographic maps and derive a probabilistic model that exploits semantic cues
in the form of sun direction, presence of an intersection, road type, speed
limit as well as the ego-car trajectory in order to produce very reliable
localization results. Our experimental evaluation shows that our approach can
localize much faster (in terms of driving time) with less computation and more
robustly than competing approaches, which ignore semantic information
Automatic Registration of RGBD Scans via Salient Directions
We address the problem of wide-baseline registration of
RGB-D data, such as photo-textured laser scans without
any artificial targets or prediction on the relative motion.
Our approach allows to fully automatically register scans
taken in GPS-denied environments such as urban canyon,
industrial facilities or even indoors. We build upon image
features which are plenty, localized well and much more
discriminative than geometry features; however, they suffer
from viewpoint distortions and request for normalization.
We utilize the principle of salient directions present in
the geometry and propose to extract (several) directions
from the distribution of surface normals or other cues such
as observable symmetries. Compared to previous work we
pose no requirements on the scanned scene (like containing
large textured planes) and can handle arbitrary surface
shapes. Rendering the whole scene from these repeatable
directions using an orthographic camera generates textures
which are identical up to 2D similarity transformations.
This ambiguity is naturally handled by 2D features and allows
to find stable correspondences among scans. For geometric
pose estimation from tentative matches we propose a
fast and robust 2 point sample consensus scheme integrating
an early rejection phase. We evaluate our approach on
different challenging real world scenes
Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
Although specular objects have gained interest in recent
years, virtually no approaches exist for markerless reconstruction
of reflective scenes in the wild. In this work, we
present a practical approach to capturing normal maps in
real-world scenes using video only. We focus on nearly planar
surfaces such as windows, facades from glass or metal,
or frames, screens and other indoor objects and show how
normal maps of these can be obtained without the use of an
artificial calibration object. Rather, we track the reflections
of real-world straight lines, while moving with a hand-held
or vehicle-mounted camera in front of the object. In contrast
to error-prone local edge tracking, we obtain the reflections
by a robust, global segmentation technique of an
ortho-rectified 3D video cube that also naturally allows efficient
user interaction. Then, at each point of the reflective
surface, the resulting 2D-curve to 3D-line correspondence
provides a novel quadratic constraint on the local surface
normal. This allows to globally solve for the shape by integrability
and smoothness constraints and easily supports
the usage of multiple lines. We demonstrate the technique
on several objects and facades
Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization
In this paper we propose a novel semantic localization algorithm that
exploits multiple sensors and has precision on the order of a few centimeters.
Our approach does not require detailed knowledge about the appearance of the
world, and our maps require orders of magnitude less storage than maps utilized
by traditional geometry- and LiDAR intensity-based localizers. This is
important as self-driving cars need to operate in large environments. Towards
this goal, we formulate the problem in a Bayesian filtering framework, and
exploit lanes, traffic signs, as well as vehicle dynamics to localize robustly
with respect to a sparse semantic map. We validate the effectiveness of our
method on a new highway dataset consisting of 312km of roads. Our experiments
show that the proposed approach is able to achieve 0.05m lateral accuracy and
1.12m longitudinal accuracy on average while taking up only 0.3% of the storage
required by previous LiDAR intensity-based approaches.Comment: 8 pages, 4 figures, 4 tables, 2019 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS 2019
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
We study an important, yet largely unexplored problem of large-scale
cross-modal visual localization by matching ground RGB images to a
geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior
works were demonstrated on small datasets and did not lend themselves to
scaling up for large-scale applications. To enable large-scale evaluation, we
introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of
RGB and aerial LIDAR depth images. We propose a novel joint embedding based
method that effectively combines the appearance and semantic cues from both
modalities to handle drastic cross-modal variations. Experiments on the
proposed dataset show that our model achieves a strong result of a median rank
of 5 in matching across a large test set of 50K location pairs collected from a
14km^2 area. This represents a significant advancement over prior works in
performance and scale. We conclude with qualitative results to highlight the
challenging nature of this task and the benefits of the proposed model. Our
work provides a foundation for further research in cross-modal visual
localization.Comment: ACM Multimedia 202
MAV Urban Localization from Google Street View Data
We tackle the problem of globally localizing a camera-equipped micro aerial vehicle flying within urban environments for which a Google Street View image database exists. To avoid the caveats of current image-search algorithms in case of severe viewpoint changes between the query and the database images, we propose to generate virtual views of the scene, which exploit the air-ground geometry of the system. To limit the computational complexity of the algorithm, we rely on a histogram-voting scheme to select the best putative image correspondences. The proposed approach is tested on a 2km image dataset captured with a small quadroctopter flying in the streets of Zurich. The success of our approach shows that our new air-ground matching algorithm can robustly handle extreme changes in viewpoint, illumination, perceptual aliasing, and over-season variations, thus, outperforming conventional visual place-recognition approaches
Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition
Given a cell phone image of a building we address the problem of place-of-interest recognition in urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-level image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a purely homothetic problem, which we show enables more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for differen