Search CORE

502 research outputs found

Estimation of the Handwritten Text Skew Based on Binary Moments

Author: D. Brodić
Z. Milivojević
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/2012
Field of study

Binary moments represent one of the methods for the text skew estimation in binary images. It has been used widely for the skew identification of the printed text. However, the handwritten text consists of text objects, which are characterized with different skews. Hence, the method should be adapted for the handwritten text. This is achieved with the image splitting into separate text objects made by the bounding boxes. Obtained text objects represent the isolated binary objects. The application of the moment-based method to each binary object evaluates their local text skews. Due to the accuracy, estimated skew data can be used as an input to the algorithms for the text line segmentation

Directory of Open Access Journals

Digital library of Brno University of Technology

Finding Similarities between Structured Documents as a Crucial Stage for Generic Structured Document Classifier

Author: Mohamed Azlinah Hj.
Mokayed Hamam
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 28/05/2013
Field of study

One of the addressed problems of classifying structured documents is the definition of a similarity measure that is applicable in real situations, where query documents are allowed to differ from the database templates. Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount of forms could be written in different languages, for example here in Malaysia forms could be written in Malay, Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify the requested documents taking into consideration that OCR is considered more easier and accurate rather than the layout detection. Keywords: Feature Extraction, Document processing, Document Classification

International Institute for Science, Technology and Education (IISTE): E-Journals

Statistics Oriented Preprocessing of Document Image

Author: Brodić Darko
Maluckov Čedomir A.
Peng Liangrui
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 19/10/2015
Field of study

Old printed documents represent an important part of our cultural heritage. Their digitalization plays an important role in creating data and metadata. The paper proposed an algorithm for estimation of the global text skew. First, document image is binarized reducing the impact of noise and uneven illumination. The binary image is statistically analyzed and processed. Accordingly, redundant data have been excluded. Furthermore, the convex hulls are established encircling each text object. They are joined establishing connected components. Then, the connected components in complementary image are enlarged with morphological dilation. At the end, the biggest connected component is extracted. Its orientation is similar to the global orientation of text document which is calculated by the moments. Efficiency and correctness of the algorithm are verified by testing on a custom dataset

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Preprocessing Techniques in Character Recognition

Author: Yasser Alginahi
Publication venue: 'IntechOpen'
Publication date: 17/08/2010
Field of study

IntechOpen

Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

Author: E. GUMAH MOHAMED
Publication venue
Publication date: 01/01/2010
Field of study

In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average

UTPedia

Recognition of Tifinaghe Characters Using Dynamic Programming & Neural Network

Author: Belaid Bouikhalene
Mohamed Fakir
Rachid El Ayachi
Publication venue: 'IntechOpen'
Publication date: 21/10/2011
Field of study

IntechOpen

Handwritten Devanagari numeral recognition

Author: Bhargav S
Publication venue
Publication date: 31/05/2014
Field of study

Optical character recognition (OCR) plays a very vital role in today’s modern world. OCR can be useful for solving many complex problems and thus making human’s job easier. In OCR we give a scanned digital image or handwritten text as the input to the system. OCR can be used in postal department for sorting of the mails and in other offices. Much work has been done for English alphabets but now a day’s Indian script is an active area of interest for the researchers. Devanagari is on such Indian script. Research is going on for the recognition of alphabets but much less concentration is given on numerals. Here an attempt was made for the recognition of Devanagari numerals. The main part of any OCR system is the feature extraction part because more the features extracted more is the accuracy. Here two methods were used for the process of feature extraction. One of the method was moment based method. There are many moment based methods but we have preferred the Tchebichef moment. Tchebichef moment was preferred because of its better image representation capability. The second method was based on the contour curvature. Contour is a very important boundary feature used for finding similarity between shapes. After the process of feature extraction, the extracted feature has to be classified and for the same Artificial Neural Network (ANN) was used. There are many classifier but we preferred ANN because it is easy to handle and less error prone and apart from that its accuracy is much higher compared to other classifier. The classification was done individually with the two extracted features and finally the features were cascaded to increase the accuracy

ethesis@nitr

A Novel Approach to Printed Arabic Optical Character Recognition

Author: Alghamdi Mansoor
Publication venue
Publication date: 25/09/2019
Field of study

Bangor University Research Portal

Recommended from our members

Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.

Author: AlKhateeb Jawad H.Y.
Publication venue: Department of Electronic Imaging and Media Communications
Publication date: 01/01/2010
Field of study

The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text. The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using ii the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved

Bradford Scholars