6,121 research outputs found

    Keyword Search on RDF Graphs - A Query Graph Assembly Approach

    Full text link
    Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent user's query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Then, we formally define the query graph assembly (QGA) problem. Unfortunately, we prove theoretically that QGA is a NP-complete problem. In order to solve that, we design some heuristic lower bounds and propose a bipartite graph matching-based best-first search algorithm. The algorithm's time complexity is O(k2ll3l)O(k^{2l} \cdot l^{3l}), where ll is the number of the keywords and kk is a tunable parameter, i.e., the maximum number of candidate entity/class vertices and predicate edges allowed to match each keyword. Although QGA is intractable, both ll and kk are small in practice. Furthermore, the algorithm's time complexity does not depend on the RDF graph size, which guarantees the good scalability of our system in large RDF graphs. Experiments on DBpedia and Freebase confirm the superiority of our system on both effectiveness and efficiency

    A fine-grained approach to scene text script identification

    Full text link
    This paper focuses on the problem of script identification in unconstrained scenarios. Script identification is an important prerequisite to recognition, and an indispensable condition for automatic text understanding systems designed for multi-language environments. Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images. We detail a novel method for script identification in natural images that combines convolutional features and the Naive-Bayes Nearest Neighbor classifier. The proposed framework efficiently exploits the discriminative power of small stroke-parts, in a fine-grained classification framework. In addition, we propose a new public benchmark dataset for the evaluation of joint text detection and script identification in natural scenes. Experiments done in this new dataset demonstrate that the proposed method yields state of the art results, while it generalizes well to different datasets and variable number of scripts. The evidence provided shows that multi-lingual scene text recognition in the wild is a viable proposition. Source code of the proposed method is made available online

    Orientation covariant aggregation of local descriptors with embeddings

    Get PDF
    Image search systems based on local descriptors typically achieve orientation invariance by aligning the patches on their dominant orientations. Albeit successful, this choice introduces too much invariance because it does not guarantee that the patches are rotated consistently. This paper introduces an aggregation strategy of local descriptors that achieves this covariance property by jointly encoding the angle in the aggregation stage in a continuous manner. It is combined with an efficient monomial embedding to provide a codebook-free method to aggregate local descriptors into a single vector representation. Our strategy is also compatible and employed with several popular encoding methods, in particular bag-of-words, VLAD and the Fisher vector. Our geometric-aware aggregation strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014

    United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search

    Get PDF
    Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This finding has led to the use of full patent applications as queries in patent prior art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into selfcoherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a specific aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a final ranked list. This technique can exploit the potential benefit of QE since the segmented queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP prior-art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential benefits of QE to alleviate the vocabulary mismatch problem in patent search
    corecore