5,598 research outputs found

    Turkish handwritten text recognition: a case of agglutinative languages

    Get PDF
    We describe a system for recognizing unconstrained Turkish handwritten text. Turkish has agglutinative morphology and theoretically an infinite number of words that can be generated by adding more suffixes to the word. This makes lexicon-based recognition approaches, where the most likely word is selected among all the alternatives in a lexicon, unsuitable for Turkish. We describe our approach to the problem using a Turkish prefix recognizer. First results of the system demonstrates the promise of this approach, with top-10 word recognition rate of about 40% for a small test data of mixed handprint and cursive writing. The lexicon-based approach with a 17,000 word-lexicon (with test words added) achieves 56% top-10 word recognition rate

    Component-based Segmentation of words from handwritten Arabic text

    Get PDF
    Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition

    ANN-based Innovative Segmentation Method for Handwritten text in Assamese

    Get PDF
    Artificial Neural Network (ANN) s has widely been used for recognition of optically scanned character, which partially emulates human thinking in the domain of the Artificial Intelligence. But prior to recognition, it is necessary to segment the character from the text to sentences, words etc. Segmentation of words into individual letters has been one of the major problems in handwriting recognition. Despite several successful works all over the work, development of such tools in specific languages is still an ongoing process especially in the Indian context. This work explores the application of ANN as an aid to segmentation of handwritten characters in Assamese- an important language in the North Eastern part of India. The work explores the performance difference obtained in applying an ANN-based dynamic segmentation algorithm compared to projection- based static segmentation. The algorithm involves, first training of an ANN with individual handwritten characters recorded from different individuals. Handwritten sentences are separated out from text using a static segmentation method. From the segmented line, individual characters are separated out by first over segmenting the entire line. Each of the segments thus obtained, next, is fed to the trained ANN. The point of segmentation at which the ANN recognizes a segment or a combination of several segments to be similar to a handwritten character, a segmentation boundary for the character is assumed to exist and segmentation performed. The segmented character is next compared to the best available match and the segmentation boundary confirmed

    Automatic Palaeographic Exploration of Genizah Manuscripts

    Get PDF
    The Cairo Genizah is a collection of hand-written documents containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, but there is an ongoing effort to document and catalogue all extant fragments. Palaeographic information plays a key role in the study of the Genizah collection. Script style, and–more specifically–handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah documents by script style, without being provided any prior information about the relevant styles. The automatically obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases where the method fails, it is due to apparent similarities between related scripts

    An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition

    Full text link
    Traditionally, the performance of ocr algorithms and systems is based on the recognition of isolated characters. When a system classifies an individual character, its output is typically a character label or a reject marker that corresponds to an unrecognized character. By comparing output labels with the correct labels, the number of correct recognition, substitution errors misrecognized characters, and rejects unrecognized characters are determined. Nowadays, although recognition of printed isolated characters is performed with high accuracy, recognition of handwritten characters still remains an open problem in the research arena. The ability to identify machine printed characters in an automated or a semi automated manner has obvious applications in numerous fields. Since creating an algorithm with a one hundred percent correct recognition rate is quite probably impossible in our world of noise and different font styles, it is important to design character recognition algorithms with these failures in mind so that when mistakes are inevitably made, they will at least be understandable and predictable to the person working with theComment: 6pages, 5 figure
    • …
    corecore