2,431 research outputs found

    An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition

    Full text link
    Traditionally, the performance of ocr algorithms and systems is based on the recognition of isolated characters. When a system classifies an individual character, its output is typically a character label or a reject marker that corresponds to an unrecognized character. By comparing output labels with the correct labels, the number of correct recognition, substitution errors misrecognized characters, and rejects unrecognized characters are determined. Nowadays, although recognition of printed isolated characters is performed with high accuracy, recognition of handwritten characters still remains an open problem in the research arena. The ability to identify machine printed characters in an automated or a semi automated manner has obvious applications in numerous fields. Since creating an algorithm with a one hundred percent correct recognition rate is quite probably impossible in our world of noise and different font styles, it is important to design character recognition algorithms with these failures in mind so that when mistakes are inevitably made, they will at least be understandable and predictable to the person working with theComment: 6pages, 5 figure

    Downs and Acrosses: Textual Markup on a Stroke Based Level

    Get PDF
    Textual encoding is one of the main focuses of Humanities Computing. However, existing encoding schemes and initiatives focus on 'text' from the character level upwards, and are of little use to scholars, such as papyrologists and palaeographers, who study the constituent strokes of individual characters. This paper discusses the development of a markup system used to annotate a corpus of images of Roman texts, resulting in an XML representation of each character on a stroke by stroke basis. The XML data generated allows further interrogation of the palaeographic data, increasing the knowledge available regarding the palaeography of the documentation produced by the Roman Army. Additionally, the corpus was used to train an Artificial Intelligence system to effectively 'read' in stroke data of unknown text and output possible, reliable, interpretations of that text: the next step in aiding historians in the reading of ancient texts. The development and implementation of the markup scheme is introduced, the results of our initial encoding effort are presented, and it is demonstrated that textual markup on a stroke level can extend the remit of marked-up digital texts in the humanities

    Image and interpretation using artificial intelligence to read ancient Roman texts

    Get PDF
    The ink and stylus tablets discovered at the Roman Fort of Vindolanda are a unique resource for scholars of ancient history. However, the stylus tablets have proved particularly difficult to read. This paper describes a system that assists expert papyrologists in the interpretation of the Vindolanda writing tablets. A model-based approach is taken that relies on models of the written form of characters, and statistical modelling of language, to produce plausible interpretations of the documents. Fusion of the contributions from the language, character, and image feature models is achieved by utilizing the GRAVA agent architecture that uses Minimum Description Length as the basis for information fusion across semantic levels. A system is developed that reads in image data and outputs plausible interpretations of the Vindolanda tablets

    Analysis by synthesis in handwriting recognition

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 34-38).by Boris L. Elbert.M.S

    A Computational Theory of Contextual Knowledge in Machine Reading

    Get PDF
    Machine recognition of off–line handwriting can be achieved by either recognising words as individual symbols (word level recognition) or by segmenting a word into parts, usually letters, and classifying those parts (letter level recognition). Whichever method is used, current handwriting recognition systems cannot overcome the inherent ambiguity in writingwithout recourse to contextual information. This thesis presents a set of experiments that use Hidden Markov Models of language to resolve ambiguity in the classification process. It goes on to describe an algorithm designed to recognise a document written by a single–author and to improve recognition by adaptingto the writing style and learning new words. Learning and adaptation is achieved by reading the document over several iterations. The algorithm is designed to incorporate contextual processing, adaptation to modify the shape of known words and learning of new words within a constrained dictionary. Adaptation occurs when a word that has previously been trained in the classifier is recognised at either the word or letter level and the word image is used to modify the classifier. Learning occurs when a new word that has not been in the training set is recognised at the letter level and is subsequently added to the classifier. Words and letters are recognised using a nearest neighbour classifier and used features based on the two–dimensional Fourier transform. By incorporating a measure of confidence based on the distribution of training points around an exemplar, adaptation and learning is constrained to only occur when a word is confidently classified. The algorithm was implemented and tested with a dictionary of 1000 words. Results show that adaptation of the letter classifier improved recognition on average by 3.9% with only 1.6% at the whole word level. Two experiments were carried out to evaluate the learning in the system. It was found that learning accounted for little improvement in the classification results and also that learning new words was prone to misclassifications being propagated
    • …
    corecore