257 research outputs found

    A Novel Method to Detect Segmentation points of Arabic Words using Peaks and Neural Network

    Get PDF
    Many methods of segmentation using detection of segmentation points or where the location of segmentation points is expected before the segmentation process,  the validity of segmentation points is verified by using ANNs. In this paper apply a novel method to detect correctly of location segmentation points by detect of peaks with neural networks for Arabic word. This method employs baseline and peaks identification; where using two steps to segmenting text. Where peaks identification function is applied which at the subword segment level to frame the minimum and maximum peaks, and baseline detection. Where these two steps have led to the best result through the model depends on minimum peaks attained by utilising a stroke operator with a view to extracting potential points of segmentation, and determining the baseline procedure was developed to approximate the parameters. Where this method has yielded highly accurate positive results for Arabic characters’ segmentation with four kinds of handwritten datasets as AHDB, IFN-ENIT, AHDB-FTR and ACDAR. Earlier results showed that the use of EDMS to MLP_ANN gives better results than GLCM and MOMENT in different groups and gives results of EDMS features on MNN with an accuracy level of 95.09% classifier for IFN-ENIT set of data

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average

    Feature Extraction Comparison in Handwriting Recognition of Batak Toba Alphabet

    Get PDF
    Offline handwriting recognition is one of the most prominent research topics due to its tremendous application and high variability as well. This paper covers the offline Batak Toba handwritten text recognition, from the noise removal, the process of feature extraction until the recognition by using several classifiers. Experiments show that elliptic fourier descriptor (EFD) is the most discriminative feature and Mahalanobis distance (MD) outperforms the two others classifier

    The Segmentation of Printed Arabic Characters Based on Interest Point

    Get PDF
    Arabic characters are different compared to the other characters whether from their forms or the way they are read. Before conducting a recognition process, we should conduct segmentation or divide each character to identify each Arabic character of the word. The enormous problem of segmenting the connected Arabic characters is dividing each character with different positions, forms, and sizes for each character. Therefore, we suggested a method in segmentation process by using the interesting point, which successfully obtains the 86.5% average accuracy

    Recognition of Arabic handwritten words

    Get PDF
    Recognizing Arabic handwritten words is a difficult problem due to the deformations of different writing styles. Moreover, the cursive nature of the Arabic writing makes correct segmentation of characters an almost impossible task. While there are many sub systems in an Arabic words recognition system, in this work we develop a sub system to recognize Part of Arabic Words (PAW). We try to solve this problem using three different approaches, implicit segmentation and two variants of holistic approach. While Rothacker found similar conclusions while this work is being prepared, we report the difficulty in locating characters in PAW using Scale Invariant Feature Transforms under the first approach. In the second and third approaches, we use holistic approach to recognize PAW using Support Vector Machine (SVM) and Active Shape Models (ASM). While there are few works that use SVM to recognize PAW, they use a small dataset; we use a large dataset and a different set of features. We also explain the errors SVM and ASM make and propose some remedies to these errors as future work

    Novel geometric features for off-line writer identification

    Get PDF
    Writer identification is an important field in forensic document examination. Typically, a writer identification system consists of two main steps: feature extraction and matching and the performance depends significantly on the feature extraction step. In this paper, we propose a set of novel geometrical features that are able to characterize different writers. These features include direction, curvature, and tortuosity. We also propose an improvement of the edge-based directional and chain code-based features. The proposed methods are applicable to Arabic and English handwriting. We have also studied several methods for computing the distance between feature vectors when comparing two writers. Evaluation of the methods is performed using both the IAM handwriting database and the QUWI database for each individual feature reaching Top1 identification rates of 82 and 87 % in those two datasets, respectively. The accuracies achieved by Kernel Discriminant Analysis (KDA) are significantly higher than those observed before feature-level writer identification was implemented. The results demonstrate the effectiveness of the improved versions of both chain-code features and edge-based directional features
    corecore