Visual-inertial odometry (VIO) systems traditionally rely on filtering or
optimization-based techniques for egomotion estimation. While these methods are
accurate under nominal conditions, they are prone to failure during severe
illumination changes, rapid camera motions, or on low-texture image sequences.
Learning-based systems have the potential to outperform classical
implementations in challenging environments, but, currently, do not perform as
well as classical methods in nominal settings. Herein, we introduce a framework
for training a hybrid VIO system that leverages the advantages of learning and
standard filtering-based state estimation. Our approach is built upon a
differentiable Kalman filter, with an IMU-driven process model and a robust,
neural network-derived relative pose measurement model. The use of the Kalman
filter framework enables the principled treatment of uncertainty at training
time and at test time. We show that our self-supervised loss formulation
outperforms a similar, supervised method, while also enabling online
retraining. We evaluate our system on a visually degraded version of the EuRoC
dataset and find that our estimator operates without a significant reduction in
accuracy in cases where classical estimators consistently diverge. Finally, by
properly utilizing the metric information contained in the IMU measurements,
our system is able to recover metric scene scale, while other self-supervised
monocular VIO approaches cannot.Comment: Accepted to the 2022 IEEE/ASME International Conference on Advanced
Intelligent Mechatronics (AIM'22