717 research outputs found
Recognition of off-line arabic handwritten dates and numeral strings
In this thesis, we present an automatic recognition system for CENPARMI off-line Arabic handwritten dates collected from Arabic Nationalities. This system consists of modules that segment and recognize an Arabic handwritten date image. First, in the segmentation module, the system explicitly segments a date image into a sequence of basic constituents or segments. As a part of this module, a special sub-module was developed to over-segment any constituent that is a candidate for a touching pair. The proposed touching pair segmentation submodule has been tested on three different datasets of handwritten numeral touching pairs: The CENPARMI Arabic [6], Urdu, and Dari [24] datasets. The final recognition rates of 92.22%, 90.43%, and 86.10% were achieved for Arabic, Urdu and Dari, respectively. Afterwards, the segments are preprocessed and sent to the classification module. In this stage, feature vectors are extracted and then recognized by an isolated numeral classifier. This recognition system has been tested in five different isolated numeral databases: The CENPARMI Arabic [6], Urdu, Dari [24], Farsi, and Pashto databases with overall recognition rates of 97.29% 97.75%, 97.75%, 97.95% and 98.36%, respectively. Finally, a date post processing module is developed to improve the recognition results. This post processing module is used in two different stages. First, in the date stage, to verify that the segmentation/recognition output represents a valid date image and it chooses the best date format to be assigned to this image. Second, in the sub-field stage, to evaluate the values for the date three parts: day, month and year. Experiments on two different databases of Arabic handwritten dates: CENPARMI Arabic database [6] and the CENPARMI Arabic Bank Cheques database [7], show encouraging results with overall recognition rates of 85.05% and 66.49, respectively
Recommended from our members
Steganoscription : exploring techniques for privacy-preserving crowdsourced transcription of handwritten documents
textThe focus my research is the historical document format represented by the Central State Hospital (CSH) dataset, handwritten medical records. The specific problem innate to the CSH dataset in question is how to transcribe sensitive, cursive-handwritten documents via a manual vehicle- such as crowdsourcing. Manual methods are necessarily no matter the sophistication of the optical character recognition system used because of the inconsistencies within cursive script. To address this problem I've developed an application that enables users to transcribe sensitive, handwritten, document images while preserving the privacy of the context around the transcribed text via random word selection and visual manipulation of the displayed text. This is made possible through several algorithms that process documents from a top-down approach. These system operations detect and segment lines of text in images, reverse the slant common to cursive script, detect and segment words, and finally, manipulate word-images before they are displayed to users; combinations of color, noise, and geometric manipulations are currently supported and used randomly. This system, called Steganoscription, combines the concepts of steganography and transcription.Informatio
Ensemble learning using multi-objective optimisation for arabic handwritten words
Arabic handwriting recognition is a dynamic and stimulating field of study within
pattern recognition. This system plays quite a significant part in today's global
environment. It is a widespread and computationally costly function due to cursive
writing, a massive number of words, and writing style. Based on the literature, the
existing features lack data supportive techniques and building geometric features.
Most ensemble learning approaches are based on the assumption of linear
combination, which is not valid due to differences in data types. Also, the existing
approaches of classifier generation do not support decision-making for selecting the
most suitable classifier, and it requires enabling multi-objective optimisation to handle
these differences in data types. In this thesis, new type of feature for handwriting using
Segments Interpolation (SI) to find the best fitting line in each of the windows with a
model for finding the best operating point window size for SI features. Multi-Objective
Ensemble Oriented (MOEO) formulated to control the classifier topology and provide
feedback support for changing the classifiers' topology and weights based on the
extension of Non-dominated Sorting Genetic Algorithm (NSGA-II). It is designated
as the Random Subset based Parents Selection (RSPS-NSGA-II) to handle neurons
and accuracy. Evaluation metrics from two perspectives classification and Multiobjective
optimization. The experimental design based on two subsets of the
IFN/ENIT database. The first one consists of 10 classes (C10) and 22 classes (C22).
The features were tested with Support Vector Machine (SVM) and Extreme Learning
Machine (ELM). This work improved due to the SI feature. SI shows a significant
result with SVM with 88.53% for C22. RSPS for C10 at k=2 achieved 91% accuracy
with fewer neurons than NSGA-II, and for C22 at k=10, accuracy has been increased
81% compared to NSGA-II 78%. Future work may consider introducing more features
to the system, applying them to other languages, and integrating it with sequence
learning for more accuracy
A character-recognition system for Hangeul
This work presents a rule-based character-recognition system for the Korean script, Hangeul. An input raster image representing one Korean character (Hangeul syllable) is thinned down to a skeleton, and the individual lines extracted. The lines, along with information on how they are interconnected, are translated into a set of hierarchical graphs, which can be easily traversed and compared with a set of reference structures represented in the same way. Hangeul consists of consonant and vowel graphemes, which are combined into blocks representing syllables. Each reference structure describes one possible variant of such a grapheme. The reference structures that best match the structures found in the input are combined to form a full Hangeul syllable. Testing all of the 11 172 possible characters, each rendered as a 200-pixel-squared raster image using the gothic font AppleGothic Regular, had a recognition accuracy of 80.6 percent. No separation logic exists to be able to handle characters whose graphemes are overlapping or conjoined; with such characters removed from the set, thereby reducing the total number of characters to 9 352, an accuracy of 96.3 percent was reached. Hand-written characters were also recognised, to a certain degree. The work shows that it is possible to create a workable character-recognition system with reasonably simple means
Sistema de processamento automático de cheques portugueses
Estágio realizado no INESC-Porto e orientado pelo Eng.º Pedro Miguel CarvalhoTese de mestrado integrado. Engenharia Electrotécnica e de Computadores - Major em Telecomunicações. Faculdade de Engenharia. Universidade do Porto. 200
- …