This thesis is concerned with the problem of Simultaneous Localisation and
Mapping (SLAM) using visual data only. Given the video stream of a moving
camera, we wish to estimate the structure of the environment and the motion
of the device most accurately and in real-time.
Two effective approaches were presented in the past. Filtering methods
marginalise out past poses and summarise the information gained over time
with a probability distribution. Keyframe methods rely on the optimisation
approach of bundle adjustment, but computationally must select only a small
number of past frames to process. We perform a rigorous comparison between
the two approaches for visual SLAM. Especially, we show that accuracy comes
from a large number of points, while the number of intermediate frames only
has a minor impact. We conclude that keyframe bundle adjustment is superior
to ltering due to a smaller computational cost.
Based on these experimental results, we develop an efficient framework for
large-scale visual SLAM using the keyframe strategy. We demonstrate that
SLAM using a single camera does not only drift in rotation and translation,
but also in scale. In particular, we perform large-scale loop closure correction
using a novel variant of pose-graph optimisation which also takes scale drift
into account. Starting from this two stage approach which tackles local motion
estimation and loop closures separately, we develop a unified framework
for real-time visual SLAM. By employing a novel double window scheme, we
present a constant-time approach which enables the local accuracy of bundle
adjustment while ensuring global consistency. Furthermore, we suggest a new
scheme for local registration using metric loop closures and present several improvements
for the visual front-end of SLAM. Our contributions are evaluated
exhaustively on a number of synthetic experiments and real-image data-set from
single cameras and range imaging devices