15,957 research outputs found
Local Descriptors Optimized for Average Precision
Extraction of local feature descriptors is a vital stage in the solution
pipelines for numerous computer vision tasks. Learning-based approaches improve
performance in certain tasks, but still cannot replace handcrafted features in
general. In this paper, we improve the learning of local feature descriptors by
optimizing the performance of descriptor matching, which is a common stage that
follows descriptor extraction in local feature based pipelines, and can be
formulated as nearest neighbor retrieval. Specifically, we directly optimize a
ranking-based retrieval performance metric, Average Precision, using deep
neural networks. This general-purpose solution can also be viewed as a listwise
learning to rank approach, which is advantageous compared to recent local
ranking approaches. On standard benchmarks, descriptors learned with our
formulation achieve state-of-the-art results in patch verification, patch
retrieval, and image matching.Comment: 13 pages, 8 figures. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Learned Multi-Patch Similarity
Estimating a depth map from multiple views of a scene is a fundamental task
in computer vision. As soon as more than two viewpoints are available, one
faces the very basic question how to measure similarity across >2 image
patches. Surprisingly, no direct solution exists, instead it is common to fall
back to more or less robust averaging of two-view similarities. Encouraged by
the success of machine learning, and in particular convolutional neural
networks, we propose to learn a matching function which directly maps multiple
image patches to a scalar similarity score. Experiments on several multi-view
datasets demonstrate that this approach has advantages over methods based on
pairwise patch similarity.Comment: 10 pages, 7 figures, Accepted at ICCV 201
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
HPatches: A benchmark and evaluation of handcrafted and learned local descriptors
In this paper, we propose a novel benchmark for evaluating local image
descriptors. We demonstrate that the existing datasets and evaluation protocols
do not specify unambiguously all aspects of evaluation, leading to ambiguities
and inconsistencies in results reported in the literature. Furthermore, these
datasets are nearly saturated due to the recent improvements in local
descriptors obtained by learning them from large annotated datasets. Therefore,
we introduce a new large dataset suitable for training and testing modern
descriptors, together with strictly defined evaluation protocols in several
tasks such as matching, retrieval and classification. This allows for more
realistic, and thus more reliable comparisons in different application
scenarios. We evaluate the performance of several state-of-the-art descriptors
and analyse their properties. We show that a simple normalisation of
traditional hand-crafted descriptors can boost their performance to the level
of deep learning based descriptors within a realistic benchmarks evaluation
Learning and Matching Multi-View Descriptors for Registration of Point Clouds
Critical to the registration of point clouds is the establishment of a set of
accurate correspondences between points in 3D space. The correspondence problem
is generally addressed by the design of discriminative 3D local descriptors on
the one hand, and the development of robust matching strategies on the other
hand. In this work, we first propose a multi-view local descriptor, which is
learned from the images of multiple views, for the description of 3D keypoints.
Then, we develop a robust matching approach, aiming at rejecting outlier
matches based on the efficient inference via belief propagation on the defined
graphical model. We have demonstrated the boost of our approaches to
registration on the public scanning and multi-view stereo datasets. The
superior performance has been verified by the intensive comparisons against a
variety of descriptors and matching methods
- …