We introduce MIPS-Fusion, a robust and scalable online RGB-D reconstruction
method based on a novel neural implicit representation --
multi-implicit-submap. Different from existing neural RGB-D reconstruction
methods lacking either flexibility with a single neural map or scalability due
to extra storage of feature grids, we propose a pure neural representation
tackling both difficulties with a divide-and-conquer design. In our method,
neural submaps are incrementally allocated alongside the scanning trajectory
and efficiently learned with local neural bundle adjustments. The submaps can
be refined individually in a back-end optimization and optimized jointly to
realize submap-level loop closure. Meanwhile, we propose a hybrid tracking
approach combining randomized and gradient-based pose optimizations. For the
first time, randomized optimization is made possible in neural tracking with
several key designs to the learning process, enabling efficient and robust
tracking even under fast camera motions. The extensive evaluation demonstrates
that our method attains higher reconstruction quality than the state of the
arts for large-scale scenes and under fast camera motions