Visual simultaneous localization and mapping (SLAM) systems face challenges
in detecting loop closure under the circumstance of large viewpoint changes. In
this paper, we present an object-based loop closure detection method based on
the spatial layout and semanic consistency of the 3D scene graph. Firstly, we
propose an object-level data association approach based on the semantic
information from semantic labels, intersection over union (IoU), object color,
and object embedding. Subsequently, multi-view bundle adjustment with the
associated objects is utilized to jointly optimize the poses of objects and
cameras. We represent the refined objects as a 3D spatial graph with semantics
and topology. Then, we propose a graph matching approach to select
correspondence objects based on the structure layout and semantic property
similarity of vertices' neighbors. Finally, we jointly optimize camera
trajectories and object poses in an object-level pose graph optimization, which
results in a globally consistent map. Experimental results demonstrate that our
proposed data association approach can construct more accurate 3D semantic
maps, and our loop closure method is more robust than point-based and
object-based methods in circumstances with large viewpoint changes