1,360 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Recognition of off-line handwritten cursive text

    Get PDF
    The author presents novel algorithms to design unconstrained handwriting recognition systems organized in three parts: In Part One, novel algorithms are presented for processing of Arabic text prior to recognition. Algorithms are described to convert a thinned image of a stroke to a straight line approximation. Novel heuristic algorithms and novel theorems are presented to determine start and end vertices of an off-line image of a stroke. A straight line approximation of an off-line stroke is converted to a one-dimensional representation by a novel algorithm which aims to recover the original sequence of writing. The resulting ordering of the stroke segments is a suitable preprocessed representation for subsequent handwriting recognition algorithms as it helps to segment the stroke. The algorithm was tested against one data set of isolated handwritten characters and another data set of cursive handwriting, each provided by 20 subjects, and has been 91.9% and 91.8% successful for these two data sets, respectively. In Part Two, an entirely novel fuzzy set-sequential machine character recognition system is presented. Fuzzy sequential machines are defined to work as recognizers of handwritten strokes. An algorithm to obtain a deterministic fuzzy sequential machine from a stroke representation, that is capable of recognizing that stroke and its variants, is presented. An algorithm is developed to merge two fuzzy machines into one machine. The learning algorithm is a combination of many described algorithms. The system was tested against isolated handwritten characters provided by 20 subjects resulting in 95.8% recognition rate which is encouraging and shows that the system is highly flexible in dealing with shape and size variations. In Part Three, also an entirely novel text recognition system, capable of recognizing off-line handwritten Arabic cursive text having a high variability is presented. This system is an extension of the above recognition system. Tokens are extracted from a onedimensional representation of a stroke. Fuzzy sequential machines are defined to work as recognizers of tokens. It is shown how to obtain a deterministic fuzzy sequential machine from a token representation that is capable'of recognizing that token and its variants. An algorithm for token learning is presented. The tokens of a stroke are re-combined to meaningful strings of tokens. Algorithms to recognize and learn token strings are described. The. recognition stage uses algorithms of the learning stage. The process of extracting the best set of basic shapes which represent the best set of token strings that constitute an unknown stroke is described. A method is developed to extract lines from pages of handwritten text, arrange main strokes of extracted lines in the same order as they were written, and present secondary strokes to main strokes. Presented secondary strokes are combined with basic shapes to obtain the final characters by formulating and solving assignment problems for this purpose. Some secondary strokes which remain unassigned are individually manipulated. The system was tested against the handwritings of 20 subjects yielding overall subword and character recognition rates of 55.4% and 51.1%, respectively

    Decision Fusion and Contextual Information for Arabic Words Recognition for Computing and Informatics

    Get PDF
    The study of multiple classifier systems has become recently an area of intensive research in pattern recognition. Also in handwriting recognition, systems combining several classifiers have been investigated. An approach for recognizing the legal amount for handwritten Arabic bank check is described in this article. The solution uses multiple information sources to recognize words. The recognition step is preformed with a parallel combination of three kinds of classifiers using holistic word structural features. The classification stage results are first normalized, and the sum combination is performed as a decision fusion scheme, after which a syntactic analyzer makes final decision on the candidate words. Using this approach, the obtained results are very interesting and promising
    • …
    corecore