3,603 research outputs found
Class-Weighted Convolutional Features for Visual Instance Search
Image retrieval in realistic scenarios targets large dynamic datasets of
unlabeled images. In these cases, training or fine-tuning a model every time
new images are added to the database is neither efficient nor scalable.
Convolutional neural networks trained for image classification over large
datasets have been proven effective feature extractors for image retrieval. The
most successful approaches are based on encoding the activations of
convolutional layers, as they convey the image spatial information. In this
paper, we go beyond this spatial information and propose a local-aware encoding
of convolutional features based on semantic information predicted in the target
image. To this end, we obtain the most discriminative regions of an image using
Class Activation Maps (CAMs). CAMs are based on the knowledge contained in the
network and therefore, our approach, has the additional advantage of not
requiring external information. In addition, we use CAMs to generate object
proposals during an unsupervised re-ranking stage after a first fast search.
Our experiments on two public available datasets for instance retrieval,
Oxford5k and Paris6k, demonstrate the competitiveness of our approach
outperforming the current state-of-the-art when using off-the-shelf models
trained on ImageNet. The source code and model used in this paper are publicly
available at http://imatge-upc.github.io/retrieval-2017-cam/.Comment: To appear in the British Machine Vision Conference (BMVC), September
201
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness
In many applications, it is important to characterize the way in which two
concepts are semantically related. Knowledge graphs such as ConceptNet provide
a rich source of information for such characterizations by encoding relations
between concepts as edges in a graph. When two concepts are not directly
connected by an edge, their relationship can still be described in terms of the
paths that connect them. Unfortunately, many of these paths are uninformative
and noisy, which means that the success of applications that use such path
features crucially relies on their ability to select high-quality paths. In
existing applications, this path selection process is based on relatively
simple heuristics. In this paper we instead propose to learn to predict path
quality from crowdsourced human assessments. Since we are interested in a
generic task-independent notion of quality, we simply ask human participants to
rank paths according to their subjective assessment of the paths' naturalness,
without attempting to define naturalness or steering the participants towards
particular indicators of quality. We show that a neural network model trained
on these assessments is able to predict human judgments on unseen paths with
near optimal performance. Most notably, we find that the resulting path
selection method is substantially better than the current heuristic approaches
at identifying meaningful paths.Comment: In Proceedings of the Web Conference (WWW) 201
Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization
This paper tackles the problem of large-scale image-based localization (IBL)
where the spatial location of a query image is determined by finding out the
most similar reference images in a large database. For solving this problem, a
critical task is to learn discriminative image representation that captures
informative information relevant for localization. We propose a novel
representation learning method having higher location-discriminating power. It
provides the following contributions: 1) we represent a place (location) as a
set of exemplar images depicting the same landmarks and aim to maximize
similarities among intra-place images while minimizing similarities among
inter-place images; 2) we model a similarity measure as a probability
distribution on L_2-metric distances between intra-place and inter-place image
representations; 3) we propose a new Stochastic Attraction and Repulsion
Embedding (SARE) loss function minimizing the KL divergence between the learned
and the actual probability distributions; 4) we give theoretical comparisons
between SARE, triplet ranking and contrastive losses. It provides insights into
why SARE is better by analyzing gradients. Our SARE loss is easy to implement
and pluggable to any CNN. Experiments show that our proposed method improves
the localization performance on standard benchmarks by a large margin.
Demonstrating the broad applicability of our method, we obtained the third
place out of 209 teams in the 2018 Google Landmark Retrieval Challenge. Our
code and model are available at https://github.com/Liumouliu/deepIBL.Comment: ICC
- …