19 research outputs found
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes
Long-term camera re-localization is an important task with numerous computer
vision and robotics applications. Whilst various outdoor benchmarks exist that
target lighting, weather and seasonal changes, far less attention has been paid
to appearance changes that occur indoors. This has led to a mismatch between
popular indoor benchmarks, which focus on static scenes, and indoor
environments that are of interest for many real-world applications. In this
paper, we adapt 3RScan - a recently introduced indoor RGB-D dataset designed
for object instance re-localization - to create RIO10, a new long-term camera
re-localization benchmark focused on indoor scenes. We propose new metrics for
evaluating camera re-localization and explore how state-of-the-art camera
re-localizers perform according to these metrics. We also examine in detail how
different types of scene change affect the performance of different methods,
based on novel ways of detecting such changes in a given RGB-D frame. Our
results clearly show that long-term indoor re-localization is an unsolved
problem. Our benchmark and tools are publicly available at
waldjohannau.github.io/RIO10Comment: ECCV 2020, project website https://waldjohannau.github.io/RIO1
Towards CNN map representation and compression for camera relocalisation
This paper presents a study on the use of Convolutional Neural Networks for
camera relocalisation and its application to map compression. We follow state
of the art visual relocalisation results and evaluate the response to different
data inputs. We use a CNN map representation and introduce the notion of map
compression under this paradigm by using smaller CNN architectures without
sacrificing relocalisation performance. We evaluate this approach in a series
of publicly available datasets over a number of CNN architectures with
different sizes, both in complexity and number of layers. This formulation
allows us to improve relocalisation accuracy by increasing the number of
training trajectories while maintaining a constant-size CNN.Comment: Submitted to the 1st International Workshop on Deep Learning for
Visual SLAM, at the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
An intelligent robotic vision system with environment perception
Ever since the dawn of computer vision[1, 2], 3D environment reconstruction and object 6D pose estimation have been a core problem. This thesis attempts to develop a novel 3D intelligent robotic vision system integrating environment reconstruction and object detection techniques to solve practical problems. Chapter 2 reviews current state-of-the art of 3D vision techniques from environment reconstruction and 6D pose estimation.In Chapter 3 a novel environment reconstruction system is proposed by using coloured point clouds. The evaluation experiment indicates that the proposed algorithm 2 is effective for small-scale and large scale and textureless scenes. Chapter 4 presents Image-6D (that is section 4.2), a learning-based object pose estimation algorithm from a single RGB image. Contour-alignment is introduced as an efficient algorithm for pose refinement in an RGB image. This new method is evaluated on two widely used benchmark image data bases, LINEMOD and Occlusion-LINEMOD. Experiments show that the proposed method surpasses other state-of-the-art RGB based prediction approaches. Chapter 5 describes Point-6D (defined in section 5.2), a novel 6D pose estimation method using coloured point clouds as input. The performance of this new method is demonstrated on LineMOD [3] and YCB-Video [4] dataset. Chapter 6 summarizes contributions and discusses potential future research directions. In addition, we presents an intelligent 3D robotic vision system deployed in a simulated/laboratory nuclear waste disposal scenario in Appendices B. To verify the results, a simulated nuclear waste handling experiment has been successfully completed via the proposed robotic system
CPO: Change Robust Panorama to Point Cloud Localization
We present CPO, a fast and robust algorithm that localizes a 2D panorama with
respect to a 3D point cloud of a scene possibly containing changes. To robustly
handle scene changes, our approach deviates from conventional feature point
matching, and focuses on the spatial context provided from panorama images.
Specifically, we propose efficient color histogram generation and subsequent
robust localization using score maps. By utilizing the unique equivariance of
spherical projections, we propose very fast color histogram generation for a
large number of camera poses without explicitly rendering images for all
candidate poses. We accumulate the regional consistency of the panorama and
point cloud as 2D/3D score maps, and use them to weigh the input color values
to further increase robustness. The weighted color distribution quickly finds
good initial poses and achieves stable convergence for gradient-based
optimization. CPO is lightweight and achieves effective localization in all
tested scenarios, showing stable performance despite scene changes, repetitive
structures, or featureless regions, which are typical challenges for visual
localization with perspective cameras.Comment: Accepted to ECCV 202