3 research outputs found

    Handwritten Document Image Retrieval

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Learning-Based Arabic Word Spotting Using a Hierarchical Classifier

    Get PDF
    The effective retrieval of information from scanned and written documents is becoming essential with the increasing amounts of digitized documents, and therefore developing efficient means of analyzing and recognizing these documents is of significant interest. Among these methods is word spotting, which has recently become an active research area. Such systems have been implemented for Latin-based and Chinese languages, while few of them have been implemented for Arabic handwriting. The fact that Arabic writing is cursive by nature and unconstrained, with no clear white space between words, makes the processing of Arabic handwritten documents a more challenging problem. In this thesis, the design and implementation of a learning-based Arabic handwritten word spotting system is presented. This incorporates the aspects of text line extraction, handwritten word recognition, partial segmentation of words, word spotting and finally validation of the spotted words. The Arabic text line is more unconstrained than that of other scripts, essentially since it also includes small connected components such as dots and diacritics that are usually located between lines. Thus, a robust method to extract text lines that takes into consideration the challenges in the Arabic handwriting is proposed. The method is evaluated on two Arabic handwritten documents databases, and the results are compared with those of two other methods for text line extraction. The results show that the proposed method is effective, and compares favorably with the other methods. Word spotting is an automatic process to search for words within a document. Applying this process to handwritten Arabic documents is challenging due to the absence of a clear space between handwritten words. To address this problem, an effective learning-based method for Arabic handwritten word spotting is proposed and presented in this thesis. For this process, sub-words or pieces of Arabic words form the basic components of the search process, and a hierarchical classifier is implemented to integrate statistical language models with the segmentation of an Arabic text line into sub-words. The holistic and analytical paradigms (for word recognition and spotting) are studied, and verification models based on combining these two paradigms have been proposed and implemented to refine the outcomes of the analytical classifier that spots words. Finally, a series of evaluation and testing experiments have been conducted to evaluate the effectiveness of the proposed systems, and these show that promising results have been obtained

    Omnilingual Segmentation-freeWord Spotting for Ancient Manuscripts Indexation

    No full text
    International audienceThis article introduces a new word spotting method designed for ancient manuscripts. We take advantage of the robustness of the gradient feature and propose a new segmentation-free matching algorithm that tolerates spatial variations. We test our algorithm on ancient Latin manuscripts and on George Washington's manuscripts
    corecore