1 research outputs found
GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end
In this paper we propose an end-to-end learnable approach that detects static
urban objects from multiple views, re-identifies instances, and finally assigns
a geographic position per object. Our method relies on a Graph Neural Network
(GNN) to, detect all objects and output their geographic positions given images
and approximate camera poses as input. Our GNN simultaneously models relative
pose and image evidence, and is further able to deal with an arbitrary number
of input views. Our method is robust to occlusion, with similar appearance of
neighboring objects, and severe changes in viewpoints by jointly reasoning
about visual image appearance and relative pose. Experimental evaluation on two
challenging, large-scale datasets and comparison with state-of-the-art methods
show significant and systematic improvements both in accuracy and efficiency,
with 2-6% gain in detection and re-ID average precision as well as 8x reduction
of training time