6,663 research outputs found
SVS-JOIN : efficient spatial visual similarity join for geo-multimedia
In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently
Coherent Keyphrase Extraction via Web Mining
Keyphrases are useful for a variety of purposes, including summarizing,
indexing, labeling, categorizing, clustering, highlighting, browsing, and
searching. The task of automatic keyphrase extraction is to select keyphrases
from within the text of a given document. Automatic keyphrase extraction makes
it feasible to generate keyphrases for the huge number of documents that do not
have manually assigned keyphrases. A limitation of previous keyphrase
extraction algorithms is that the selected keyphrases are occasionally
incoherent. That is, the majority of the output keyphrases may fit together
well, but there may be a minority that appear to be outliers, with no clear
semantic relation to the majority or to each other. This paper presents
enhancements to the Kea keyphrase extraction algorithm that are designed to
increase the coherence of the extracted keyphrases. The approach is to use the
degree of statistical association among candidate keyphrases as evidence that
they may be semantically related. The statistical association is measured using
web mining. Experiments demonstrate that the enhancements improve the quality
of the extracted keyphrases. Furthermore, the enhancements are not
domain-specific: the algorithm generalizes well when it is trained on one
domain (computer science documents) and tested on another (physics documents).Comment: 6 pages, related work available at http://purl.org/peter.turney
- …