242 research outputs found

    A Knowledge based segmentation algorithm for enhanced recognition of handwritten courtesy amounts

    Get PDF
    "March 1994."Includes bibliographical references (p. [23]-[24]).Supported by the Productivity From Information Technology (PROFIT) Research Initiative at MIT.Karim Hussein ... [et al.

    Using generative models for handwritten digit recognition

    Get PDF
    We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian ``ink generators'' spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. (2) During the process of explaining the image, generative models can perform recognition driven segmentation. (3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. (4) Unlike many other recognition schemes it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is it requires much more computation than more standard OCR techniques

    Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

    Get PDF
    International audienceThis paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields

    Adding feedback to improve segmentation and recognition of handwritten numerals

    Get PDF
    Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaves 68-69).by Susan A. Dey.S.B.and M.Eng

    Applying Genetic Algorithm in Multi Language\u27s Characters Recognition

    Get PDF

    Off-line Thai handwriting recognition in legal amount

    Get PDF
    Thai handwriting in legal amounts is a challenging problem and a new field in the area of handwriting recognition research. The focus of this thesis is to implement Thai handwriting recognition system. A preliminary data set of Thai handwriting in legal amounts is designed. The samples in the data set are characters and words of the Thai legal amounts and a set of legal amounts phrases collected from a number of native Thai volunteers. At the preprocessing and recognition process, techniques are introduced to improve the characters recognition rates. The characters are divided into two smaller subgroups by their writing levels named body and high groups. The recognition rates of both groups are increased based on their distinguished features. The writing level separation algorithms are implemented using the size and position of characters. Empirical experiments are set to test the best combination of the feature to increase the recognition rates. Traditional recognition systems are modified to give the accumulative top-3 ranked answers to cover the possible character classes. At the postprocessing process level, the lexicon matching algorithms are implemented to match the ranked characters with the legal amount words. These matched words are joined together to form possible choices of amounts. These amounts will have their syntax checked in the last stage. Several syntax violations are caused by consequence faulty character segmentation and recognition resulting from connecting or broken characters. The anomaly in handwriting caused by these characters are mainly detected by their size and shape. During the recovery process, the possible word boundary patterns can be pre-defined and used to segment the hypothesis words. These words are identified by the word recognition and the results are joined with previously matched words to form the full amounts and checked by the syntax rules again. From 154 amounts written by 10 writers, the rejection rate is 14.9 percent with the recovery processes. The recognition rate for the accepted amount is 100 percent

    A System for the Off-Line Recognition of Handwritten Text

    Get PDF
    A new system for the recognition of handwritten text is described. The system goes from raw, binary scanned images of census forms to ASCII transcriptions of the fields contained within the forms. The first step is to locate and extract the handwritten input from the forms. Then, a large number of character subimages are extracted and individually classified using a MLP (Multi-Layer Perceptron). A Viterbi-like algorithm is used to assemble the individual classified character subimages into optimal interpretations of an input string, taking into account both the quality of the overall segmentation and the degree to which each character subimage of the segmentation matches a character model. The system uses two different statistical language models, one based on a phrase dictionary and the other based on a simple word grammar. Hypotheses from recognition based on each language model are integrated using a decision tree classifier. Results from the application of the system to the recognition of handwritten responses on U.S. census forms are reported

    Novel word recognition and word spotting systems for offline Urdu handwriting

    Get PDF
    Word recognition for offline Arabic, Farsi and Urdu handwriting is a subject which has attained much attention in the OCR field. This thesis presents the implementations of offline Urdu Handwritten Word Recognition (HWR) and an Urdu word spotting technique. This thesis first introduces the creation of several offline CENPARMI Urdu databases. These databases were necessary for offline Urdu HWR experiments. The holistic-based recognition approach was followed for the Urdu HWR system. In this system, the basic pre-processing of images was performed. In the feature extraction phase, the gradient and structural features were extracted from greyscale and binary word images, respectively. This recognition system extracted 592 feature sets and these features helped in improving the recognition results. The system was trained and tested on 57 words. Overall, we achieved a 97 % accuracy rate for handwritten word recognition by using the SVM classifier. Our word spotting technique used the holistic HWR system for recognition purposes. This word spotting system consisted of two processes: the segmentation of handwritten connected components and diacritics from Urdu text lines and the word spotting algorithm. A small database of handwritten text pages was created for testing the word spotting system. This database consisted of texts from ten Urdu native speakers. The rule-based segmentation system was applied for segmentation (or extracting) for handwritten Urdu subwords or connected components from text lines. We achieved a 92% correct segmentation rate for 372 text lines. In the word spotting algorithm, the candidate words were generated from the segmented connected components. These candidate words were sent to the holistic HWR system, which extracted the features and tried to recognize each image as one of the 57 words. After classification, each image was sent to the verification/rejection phase, which helped in rejecting the maximum number of unseen (raw data) images. Overall, we achieved a 50% word spotting precision at a 70% recall rat

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
    • …
    corecore