9 research outputs found
Is Geometry Enough for Matching in Visual Localization?
In this paper, we propose to go beyond the well-established approach to
vision-based localization that relies on visual descriptor matching between a
query image and a 3D point cloud. While matching keypoints via visual
descriptors makes localization highly accurate, it has significant storage
demands, raises privacy concerns and requires update to the descriptors in the
long-term. To elegantly address those practical challenges for large-scale
localization, we present GoMatch, an alternative to visual-based matching that
solely relies on geometric information for matching image keypoints to maps,
represented as sets of bearing vectors. Our novel bearing vectors
representation of 3D points, significantly relieves the cross-modal challenge
in geometric-based matching that prevented prior work to tackle localization in
a realistic environment. With additional careful architecture design, GoMatch
improves over prior geometric-based matching work with a reduction of
(10.67m,95.7deg) and (1.43m, 34.7deg) in average median pose errors on
Cambridge Landmarks and 7-Scenes, while requiring as little as 1.5/1.7% of
storage capacity in comparison to the best visual-based matching methods. This
confirms its potential and feasibility for real-world localization and opens
the door to future efforts in advancing city-scale visual localization methods
that do not require storing visual descriptors.Comment: ECCV2022 Camera Read
How To Train Your Deep Multi-Object Tracker
International audienceThe recent trend in vision-based multi-object tracking (MOT) is heading towards leveraging the representationalpower of deep learning to jointly learn to detect and trackobjects. However, existing methods train only certain sub-modules using loss functions that often do not correlate withestablished tracking evaluation measures such as Multi-Object Tracking Accuracy (MOTA) and Precision (MOTP). As these measures are not differentiable, the choice of appropriate loss functions for end-to-end training of multi-object tracking methods is still an open research problem. In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers. As a key ingredient, we propose a DeepHungarian Net (DHN) module that approximates the Hungarian matching algorithm. DHN allows estimating the correspondence between object tracks and ground truth objects to compute differentiable proxies of MOTA and MOTP, which are in turn used to optimize deep trackers directly. We experimentally demonstrate that the proposed differentiable framework improves the performance of existing multi-object trackers, and we establish a new state of the art on the MOTChallenge benchmark. Our code is publicly available from https://github.com/yihongXU/deepMOT