3 research outputs found
On-line recognition of English and numerical characters.
by Cheung Wai-Hung Wellis.Thesis (M.Sc.)--Chinese University of Hong Kong, 1992.Includes bibliographical references (leaves 52-54).ACKNOWLEDGEMENTSABSTRACTChapter 1 --- INTRODUCTION --- p.1Chapter 1.1 --- CLASSIFICATION OF CHARACTER RECOGNITION --- p.1Chapter 1.2 --- HISTORICAL DEVELOPMENT --- p.3Chapter 1.3 --- RECOGNITION METHODOLOGY --- p.4Chapter 2 --- ORGANIZATION OF THIS REPORT --- p.7Chapter 3 --- DATA SAMPLING --- p.8Chapter 3.1 --- GENERAL CONSIDERATION --- p.8Chapter 3.2 --- IMPLEMENTATION --- p.9Chapter 4 --- PREPROCESSING --- p.10Chapter 4.1 --- GENERAL CONSIDERATION --- p.10Chapter 4.2 --- IMPLEMENTATION --- p.12Chapter 4.2.1 --- Stroke connection --- p.12Chapter 4.2.2 --- Rotation --- p.12Chapter 4.2.3 --- Scaling --- p.14Chapter 4.2.4 --- De-skewing --- p.15Chapter 5 --- STROKE SEGMENTATION --- p.17Chapter 5.1 --- CONSIDERATION --- p.17Chapter 5.2 --- IMPLEMENTATION --- p.20Chapter 6 --- LEARNING --- p.26Chapter 7 --- PROTOTYPE MANAGEMENT --- p.27Chapter 8 --- RECOGNITION --- p.29Chapter 8.1 --- CONSIDERATION --- p.29Chapter 8.1.1 --- Delayed Stroke Tagging --- p.29Chapter 8.1.2 --- Bi-gram --- p.29Chapter 8.1.3 --- Character Scoring --- p.30Chapter 8.1.4 --- Ligature Handling --- p.32Chapter 8.1.5 --- Word Scoring --- p.32Chapter 8.2 --- IMPLEMENTATION --- p.33Chapter 8.2.1 --- Simple Matching --- p.33Chapter 8.2.2 --- Best First Search Matching --- p.33Chapter 8.2.3 --- Multiple Track Method --- p.35Chapter 8.3 --- SYSTEM PERFORMANCE TUNING --- p.37Chapter 9 --- POST-PROCESSING --- p.38Chapter 9.1 --- PROBABILITY MODEL --- p.38Chapter 9.2 --- WORD DICTIONARY APPROACH --- p.39Chapter 10 --- SYSTEM IMPLEMENTATION AND PERFORMANCE --- p.41Chapter 11 --- DISCUSSION --- p.43Chapter 12 --- EPILOG --- p.47Chapter APPENDIX I - --- PROBLEMS ENCOUNTERED AND SUGGESTED ENHANCEMENTS ON THE SYSTEM --- p.48Chapter APPENDIX II - --- GLOSSARIES --- p.51REFERENCES --- p.5
Development of Features for Recognition of Handwritten Odia Characters
In this thesis, we propose four different schemes for recognition of handwritten atomic Odia characters which includes forty seven alphabets and ten numerals. Odia is the mother tongue of the state of Odisha in the republic of India. Optical character recognition (OCR) for many languages is quite matured and OCR systems are already available in industry standard but, for the Odia language OCR is still a challenging task. Further, the features described for other languages can’t be directly utilized for Odia character recognition for both printed and handwritten text. Thus, the prime thrust has been made to propose features and utilize a classifier to derive a significant recognition accuracy. Due to the non-availability of a handwritten Odia database for validation of the proposed schemes, we have collected samples from individuals to generate a database of large size through a digital note maker. The database consists of a total samples of 17, 100 (150 × 2 × 57) collected from 150 individuals at two different times for 57 characters. This database has been named Odia handwritten character set version 1.0 (OHCS v1.0) and is made available in http://nitrkl.ac.in/Academic/Academic_Centers/Centre_For_Computer_Vision.aspx for the use of researchers. The first scheme divides the contour of each character into thirty segments. Taking the centroid of the character as base point, three primary features length, angle, and chord-to-arc-ratio are extracted from each segment. Thus, there are 30 feature values for each primary attribute and a total of 90 feature points. A back propagation neural network has been employed for the recognition and performance comparisons are made with competent schemes. The second contribution falls in the line of feature reduction of the primary features derived in the earlier contribution. A fuzzy inference system has been employed to generate an aggregated feature vector of size 30 from 90 feature points which represent the most significant features for each character. For recognition, a six-state hidden Markov model (HMM) is employed for each character and as a consequence we have fifty-seven ergodic HMMs with six-states each. An accuracy of 84.5% has been achieved on our dataset. The third contribution involves selection of evidence which are the most informative local shape contour features. A dedicated distance metric namely, far_count is used in computation of the information gain values for possible segments of different lengths that are extracted from whole shape contour of a character. The segment, with highest information gain value is treated as the evidence and mapped to the corresponding class. An evidence dictionary is developed out of these evidence from all classes of characters and is used for testing purpose. An overall testing accuracy rate of 88% is obtained.
The final contribution deals with the development of a hybrid feature derived from discrete wavelet transform (DWT) and discrete cosine transform (DCT). Experimentally it has been observed that a 3-level DWT decomposition with 72 DCT coefficients from each high-frequency components as features gives a testing accuracy of 86% in a neural classifier. The suggested features are studied in isolation and extensive simulations has been carried out along with other existing schemes using the same data set. Further, to study generalization behavior of proposed schemes, they are applied on English and Bangla handwritten datasets. The performance parameters like recognition rate and misclassification rate are computed and compared. Further, as we progress from one contribution to the other, the proposed scheme is compared with the earlier proposed schemes
Recommended from our members
Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text.
The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using
ii
the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved