2,118 research outputs found

    Query by String word spotting based on character bi-gram indexing

    Full text link
    In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

    The impact of the image processing in the indexation system

    Get PDF
    This paper presents an efficient word spotting system applied to handwritten Arabic documents, where images are represented with bag-of-visual-SIFT descriptors and a sliding window approach is used to locate the regions that are most similar to the query by following the query-by-example paragon. First, a pre-processing step is used to produce a better representation of the most informative features. Secondly, a region-based framework is deployed to represent each local region by a bag-of-visual-SIFT descriptors. Afterward, some experiments are in order to demonstrate the codebook size influence on the efficiency of the system, by analyzing the curse of dimensionality curve. In the end, to measure the similarity score, a floating distance based on the descriptor’s number for each query is adopted. The experimental results prove the efficiency of the proposed processing steps in the word spotting system
    • …
    corecore