5,357 research outputs found
Handwritten Character Recognition of South Indian Scripts: A Review
Handwritten character recognition is always a frontier area of research in
the field of pattern recognition and image processing and there is a large
demand for OCR on hand written documents. Even though, sufficient studies have
performed in foreign scripts like Chinese, Japanese and Arabic characters, only
a very few work can be traced for handwritten character recognition of Indian
scripts especially for the South Indian scripts. This paper provides an
overview of offline handwritten character recognition in South Indian Scripts,
namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language
Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure
Off-line hand-printed chinese character recognition based on stroke matching
The specific purpose of this thesis is the automated recognition of the off-line Chinese hand-printed characters by using a blue ball-point pen. Through mask processing, the main components in a Chinese character such as vertical, horizontal, and slant strokes can be extracted. Then, the connected components with the coordinates of the top, bottom, leftmost, and rightmost ends of each stroke extracted are found. From these coordinates, the length and position of each stroke can be computed.
According to the number, relative length, and relative position of each stroke, both of the coarse and fine rule-based classification can be made, and the goal of this thesis is able to be reached.
Excluding the load and segmentation of the original image, the computing time for the feature extraction and classification depends on the image size and the number of strokes. It is about 0.3 seconds per Chinese character on an IBM PC 80486 DX33.
The advantages of the proposed method include efficient time complexity, strong ability to detect very similar Chinese characters, tolerance of the slope of the stroke, and 96% or higher recognition rate.
The disadvantage is the inflexibility for learning driven by the users since the matching rules are open to the manufactures only at present
Arabic/Latin and Machine-printed/Handwritten Word Discrimination using HOG-based Shape Descriptor
In this paper, we present an approach for Arabic and Latin script and its type identification based onHistogram of Oriented Gradients (HOG) descriptors. HOGs are first applied at word level based on writingorientation analysis. Then, they are extended to word image partitions to capture fine and discriminativedetails. Pyramid HOG are also used to study their effects on different observation levels of the image.Finally, co-occurrence matrices of HOG are performed to consider spatial information between pairs ofpixels which is not taken into account in basic HOG. A genetic algorithm is applied to select the potentialinformative features combinations which maximizes the classification accuracy. The output is a relativelyshort descriptor that provides an effective input to a Bayes-based classifier. Experimental results on a set ofwords, extracted from standard databases, show that our identification system is robust and provides goodword script and type identification: 99.07% of words are correctly classified
Off-line text-independent writer recognition for Chinese handwriting: a review
This paper provides a comprehensive review of existing works including the characteristics of Chinese characters’ complex stroke crossing and challenges, which is still a largely unexplored subject for off-line text-independent Chinese handwriting identification
Recognition of off-line handwritten cursive text
The author presents novel algorithms to design unconstrained handwriting
recognition systems organized in three parts:
In Part One, novel algorithms are presented for processing of Arabic text prior to
recognition. Algorithms are described to convert a thinned image of a stroke to a straight
line approximation. Novel heuristic algorithms and novel theorems are presented to
determine start and end vertices of an off-line image of a stroke. A straight line
approximation of an off-line stroke is converted to a one-dimensional representation by
a novel algorithm which aims to recover the original sequence of writing. The resulting
ordering of the stroke segments is a suitable preprocessed representation for subsequent
handwriting recognition algorithms as it helps to segment the stroke. The algorithm was
tested against one data set of isolated handwritten characters and another data set of
cursive handwriting, each provided by 20 subjects, and has been 91.9% and 91.8%
successful for these two data sets, respectively.
In Part Two, an entirely novel fuzzy set-sequential machine character recognition
system is presented. Fuzzy sequential machines are defined to work as recognizers of
handwritten strokes. An algorithm to obtain a deterministic fuzzy sequential machine from
a stroke representation, that is capable of recognizing that stroke and its variants, is
presented. An algorithm is developed to merge two fuzzy machines into one machine. The
learning algorithm is a combination of many described algorithms. The system was tested
against isolated handwritten characters provided by 20 subjects resulting in 95.8%
recognition rate which is encouraging and shows that the system is highly flexible in
dealing with shape and size variations.
In Part Three, also an entirely novel text recognition system, capable of recognizing
off-line handwritten Arabic cursive text having a high variability is presented. This system
is an extension of the above recognition system. Tokens are extracted from a onedimensional
representation of a stroke. Fuzzy sequential machines are defined to work as
recognizers of tokens. It is shown how to obtain a deterministic fuzzy sequential machine
from a token representation that is capable'of recognizing that token and its variants. An
algorithm for token learning is presented. The tokens of a stroke are re-combined to
meaningful strings of tokens. Algorithms to recognize and learn token strings are
described. The. recognition stage uses algorithms of the learning stage. The process of
extracting the best set of basic shapes which represent the best set of token strings that
constitute an unknown stroke is described. A method is developed to extract lines from
pages of handwritten text, arrange main strokes of extracted lines in the same order as
they were written, and present secondary strokes to main strokes. Presented secondary
strokes are combined with basic shapes to obtain the final characters by formulating and
solving assignment problems for this purpose. Some secondary strokes which remain
unassigned are individually manipulated. The system was tested against the handwritings
of 20 subjects yielding overall subword and character recognition rates of 55.4% and
51.1%, respectively
Kannada Character Recognition System A Review
Intensive research has been done on optical character recognition ocr and a
large number of articles have been published on this topic during the last few
decades. Many commercial OCR systems are now available in the market, but most
of these systems work for Roman, Chinese, Japanese and Arabic characters. There
are no sufficient number of works on Indian language character recognition
especially Kannada script among 12 major scripts in India. This paper presents
a review of existing work on printed Kannada script and their results. The
characteristics of Kannada script and Kannada Character Recognition System kcr
are discussed in detail. Finally fusion at the classifier level is proposed to
increase the recognition accuracy.Comment: 12 pages, 8 figure
Character Recognition
Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field
- …