Search CORE

6,121 research outputs found

Keyword Search on RDF Graphs - A Query Graph Assembly Approach

Author: Han Shuo
Yu Jeffrey Xu
Zhao Dongyan
Zou Lei
Publication venue
Publication date: 25/08/2017
Field of study

Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent user's query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Then, we formally define the query graph assembly (QGA) problem. Unfortunately, we prove theoretically that QGA is a NP-complete problem. In order to solve that, we design some heuristic lower bounds and propose a bipartite graph matching-based best-first search algorithm. The algorithm's time complexity is

O(k^{2l} \cdot l^{3l})

, where

l

is the number of the keywords and

k

is a tunable parameter, i.e., the maximum number of candidate entity/class vertices and predicate edges allowed to match each keyword. Although QGA is intractable, both

l

and

k

are small in practice. Furthermore, the algorithm's time complexity does not depend on the RDF graph size, which guarantees the good scalability of our system in large RDF graphs. Experiments on DBpedia and Freebase confirm the superiority of our system on both effectiveness and efficiency

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

A fine-grained approach to scene text script identification

Author: Gomez Lluis
Karatzas Dimosthenis
Publication venue
Publication date: 24/02/2016
Field of study

This paper focuses on the problem of script identification in unconstrained scenarios. Script identification is an important prerequisite to recognition, and an indispensable condition for automatic text understanding systems designed for multi-language environments. Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images. We detail a novel method for script identification in natural images that combines convolutional features and the Naive-Bayes Nearest Neighbor classifier. The proposed framework efficiently exploits the discriminative power of small stroke-parts, in a fine-grained classification framework. In addition, we propose a new public benchmark dataset for the evaluation of joint text detection and script identification in natural scenes. Experiments done in this new dataset demonstrate that the proposed method yields state of the art results, while it generalizes well to different datasets and variable number of scripts. The evidence provided shows that multi-lingual scene text recognition in the wild is a viable proposition. Source code of the proposed method is made available online

arXiv.org e-Print Archive

Crossref

Orientation covariant aggregation of local descriptors with embeddings

Author: Furon Teddy
Jégou Hervé
Tolias Giorgos
Publication venue
Publication date: 06/09/2014
Field of study

Image search systems based on local descriptors typically achieve orientation invariance by aligning the patches on their dominant orientations. Albeit successful, this choice introduces too much invariance because it does not guarantee that the patches are rotated consistently. This paper introduces an aggregation strategy of local descriptors that achieves this covariance property by jointly encoding the angle in the aggregation stage in a continuous manner. It is combined with an efficient monomial embedding to provide a codebook-free method to aggregate local descriptors into a single vector representation. Our strategy is also compatible and employed with several popular encoding methods, in particular bag-of-words, VLAD and the Fisher vector. Our geometric-aware aggregation strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/10/2011
Field of study

Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This ﬁnding has led to the use of full patent applications as queries in patent prior art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into selfcoherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a speciﬁc aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a ﬁnal ranked list. This technique can exploit the potential beneﬁt of QE since the segmented queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP prior-art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential beneﬁts of QE to alleviate the vocabulary mismatch problem in patent search

CiteSeerX

Irish Universities

DCU Online Research Access Service