2,189 research outputs found
SVS-JOIN : efficient spatial visual similarity join for geo-multimedia
In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently
Efficient image copy detection using multi-scale fingerprints
Inspired by multi-resolution histogram, we propose
a multi-scale SIFT descriptor to improve the discriminability.
A series of SIFT descriptions with different scale are first
acquired by varying the actual size of each spatial bin. Then
principle component analysis (PCA) is employed to reduce them
to low dimensional vectors, which are further combined into one
128-dimension multi-scale SIFT description. Next, an entropy
maximization based binarization is employed to encode the
descriptions into binary codes called fingerprints for indexing
the local features. Furthermore, an efficient search architecture
consisting of lookup tables and inverted image ID list is designed
to improve the query speed. Since the fingerprint building is
of low-complexity, this method is very efficient and scalable to
very large databases. In addition, the multi-scale fingerprints
are very discriminative such that the copies can be effectively
distinguished from similar objects, which leads to an improved
performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art
methods.Inspired by multi-resolution histogram, we propose a multi-scale SIFT descriptor to improve the discriminability. A series of SIFT descriptions with different scale are first acquired by varying the actual size of each spatial bin. Then principle component analysis (PCA) is employed to reduce them to low dimensional vectors, which are further combined into one 128-dimension multi-scale SIFT description. Next, an entropy maximization based binarization is employed to encode the descriptions into binary codes called fingerprints for indexing the local features. Furthermore, an efficient search architecture consisting of lookup tables and inverted image ID list is designed to improve the query speed. Since the fingerprint building is of low-complexity, this method is very efficient and scalable to very large databases. In addition, the multi-scale fingerprints are very discriminative such that the copies can be effectively distinguished from similar objects, which leads to an improved performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art methods
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Asymmetric Feature Maps with Application to Sketch Based Retrieval
We propose a novel concept of asymmetric feature maps (AFM), which allows to
evaluate multiple kernels between a query and database entries without
increasing the memory requirements. To demonstrate the advantages of the AFM
method, we derive a short vector image representation that, due to asymmetric
feature maps, supports efficient scale and translation invariant sketch-based
image retrieval. Unlike most of the short-code based retrieval systems, the
proposed method provides the query localization in the retrieved image. The
efficiency of the search is boosted by approximating a 2D translation search
via trigonometric polynomial of scores by 1D projections. The projections are a
special case of AFM. An order of magnitude speed-up is achieved compared to
traditional trigonometric polynomials. The results are boosted by an
image-based average query expansion, exceeding significantly the state of the
art on standard benchmarks.Comment: CVPR 201
- …