1,134 research outputs found
A location-aware embedding technique for accurate landmark recognition
The current state of the research in landmark recognition highlights the good
accuracy which can be achieved by embedding techniques, such as Fisher vector
and VLAD. All these techniques do not exploit spatial information, i.e.
consider all the features and the corresponding descriptors without embedding
their location in the image. This paper presents a new variant of the
well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique
which accounts, at a certain degree, for the location of features. The driving
motivation comes from the observation that, usually, the most interesting part
of an image (e.g., the landmark to be recognized) is almost at the center of
the image, while the features at the borders are irrelevant features which do
no depend on the landmark. The proposed variant, called locVLAD (location-aware
VLAD), computes the mean of the two global descriptors: the VLAD executed on
the entire original image, and the one computed on a cropped image which
removes a certain percentage of the image borders. This simple variant shows an
accuracy greater than the existing state-of-the-art approach. Experiments are
conducted on two public datasets (ZuBuD and Holidays) which are used both for
training and testing. Morever a more balanced version of ZuBuD is proposed.Comment: 6 pages, 5 figures, ICDSC 201
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
Using Apache Lucene to Search Vector of Locally Aggregated Descriptors
Surrogate Text Representation (STR) is a profitable solution to efficient
similarity search on metric space using conventional text search engines, such
as Apache Lucene. This technique is based on comparing the permutations of some
reference objects in place of the original metric distance. However, the
Achilles heel of STR approach is the need to reorder the result set of the
search according to the metric distance. This forces to use a support database
to store the original objects, which requires efficient random I/O on a fast
secondary memory (such as flash-based storages). In this paper, we propose to
extend the Surrogate Text Representation to specifically address a class of
visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD).
This approach is based on representing the individual sub-vectors forming the
VLAD vector with the STR, providing a finer representation of the vector and
enabling us to get rid of the reordering phase. The experiments on a publicly
available dataset show that the extended STR outperforms the baseline STR
achieving satisfactory performance near to the one obtained with the original
VLAD vectors.Comment: In Proceedings of the 11th Joint Conference on Computer Vision,
Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) -
Volume 4: VISAPP, p. 383-39
Embedding based on function approximation for large scale image search
The objective of this paper is to design an embedding method that maps local
features describing an image (e.g. SIFT) to a higher dimensional representation
useful for the image retrieval problem. First, motivated by the relationship
between the linear approximation of a nonlinear function in high dimensional
space and the stateof-the-art feature representation used in image retrieval,
i.e., VLAD, we propose a new approach for the approximation. The embedded
vectors resulted by the function approximation process are then aggregated to
form a single representation for image retrieval. Second, in order to make the
proposed embedding method applicable to large scale problem, we further derive
its fast version in which the embedded vectors can be efficiently computed,
i.e., in the closed-form. We compare the proposed embedding methods with the
state of the art in the context of image search under various settings: when
the images are represented by medium length vectors, short vectors, or binary
vectors. The experimental results show that the proposed embedding methods
outperform existing the state of the art on the standard public image retrieval
benchmarks.Comment: Accepted to TPAMI 2017. The implementation and precomputed features
of the proposed F-FAemb are released at the following link:
http://tinyurl.com/F-FAem
Selective Deep Convolutional Features for Image Retrieval
Convolutional Neural Network (CNN) is a very powerful approach to extract
discriminative local descriptors for effective image search. Recent work adopts
fine-tuned strategies to further improve the discriminative power of the
descriptors. Taking a different approach, in this paper, we propose a novel
framework to achieve competitive retrieval performance. Firstly, we propose
various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a
representative subset of local convolutional features and remove a large number
of redundant features. We demonstrate that this can effectively address the
burstiness issue and improve retrieval accuracy. Secondly, we propose to employ
recent embedding and aggregating methods to further enhance feature
discriminability. Extensive experiments demonstrate that our proposed framework
achieves state-of-the-art retrieval accuracy.Comment: Accepted to ACM MM 201
- …