3,685 research outputs found

    Junction Point Detection And Identification Of Broken Character In Touching Arabic Handwritten Text Using Overlapping Set Theory

    Get PDF
    Touching characters are formed when two or more characters share the same space with each other. Therefore, segmentation of these touching character is very challenging research topic especially for handwritten Arabic degraded documents. This is one of the key issue in recognition of the handwritten Arabic text. In order to make the recognition system more effective segmentation of these touching handwritten Arabic characters is considered to be very important research area. In this research, a new method is proposed, which is used to identify the junction or common point of Arabic touching word image by applying overlapping or intersection set theory operation, which will help to trace the correct boundary of the touching characters, identify the broken characters and also segmented these touching handwritten text in an efficient way. The proposed method has been evaluated on Arabic touching handwritten characters taken from handwritten datasets. The results show the efficiency of the proposed method. The proposed method is applicable to both degraded handwritten documents and printed documents

    Offline arabic character recognition using genetic approach

    Get PDF
    Many optical character recognition (OCR) techniques and tools have been developed for plurality of languages. A successful OCR system improves interactivity between humans and computers in many applications such as digitising and recognising written content. With regard to Arabic OCR, the problem of handwriting recognition is challenging because Arabic letters are cursive and shapechangeable depending on their positions. OCR systems have reached nearly perfect acknowledgement of Arabic printed text, yet still in its inception and needs to be greatly improved with handwritten text. Therefore in this study, an approach to recognize Arabic characters based on genetic algorithms (GA) is proposed. The approach requires two separate stages; feature extraction and GA for character recognition development. In the feature extraction stage, six features are detected for each character and denoted as a feature vector of 6 integer numbers. The feature vectors are then utilised in the next stage. Three genetic operators namely selection, crossover and mutation are implemented to search for the similar vectors with the best fitness value to recognise the character. The data used in this study were collected from different resources and stored in a database. It consists of 12,500 printed text words in 50 paragraphs and 15,000 words written by 100 different writers, males and females aged 5 to 60 years. Pre-processing operations are conducted including segmenting paragraphs into lines, segmenting line into words, segmenting words into characters, detecting skeleton, and determining baseline and other horizontal zones. The experimental results have shown that the proposed method has achieved promising accuracy recognition rate with 90.46% for printed text and handwritten characters

    Adaptive dissection based subword segmentation of printed Arabic text

    Get PDF
    Numerous segmentation and recognition techniques have been proposed in literature for Arabic OCR system. Correct and efficient segmentation of Arabic text into characters is considered to be a fundamental problem. While OCR systems for other languages do not need segmentation for printed text for successful recognition, it is essential to design robust and powerful segmentation algorithms or employ segmentation free recognition schemes for printed Arabic text. Even more, in recognition of handwritten characters, segmentation is considered to be indispensable. Most of current segmentation technique suffers from over segmentation and under segmentation in addition to not being adaptive in nature. In this paper, we have proposed a new sub-word segmentation scheme, which is independent of font size and font type

    Adaptive dissection based subword segmentation of printed Arabic text

    Get PDF
    Numerous segmentation and recognition techniques have been proposed in literature for Arabic OCR system. Correct and efficient segmentation of Arabic text into characters is considered to be a fundamental problem. While OCR systems for other languages do not need segmentation for printed text for successful recognition, it is essential to design robust and powerful segmentation algorithms or employ segmentation free recognition schemes for printed Arabic text. Even more, in recognition of handwritten characters, segmentation is considered to be indispensable. Most of current segmentation technique suffers from over segmentation and under segmentation in addition to not being adaptive in nature. In this paper, we have proposed a new sub-word segmentation scheme, which is independent of font size and font type

    Component-based Segmentation of words from handwritten Arabic text

    Get PDF
    Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition

    Handwritten Arabic character recognition: which feature extraction method?

    Get PDF
    Recognition of Arabic handwriting characters is a difficult task due to similar appearance of some different characters. However, the selection of the method for feature extraction remains the most important step for achieving high recognition accuracy. The purpose of this paper is to compare the effectiveness of Discrete Cosine Transform and Discrete Wavelet transform to capture discriminative features of Arabic handwritten characters. A new database containing 5600 characters covering all shapes of Arabic handwriting characters has also developed for the purpose of the analysis. The coefficients of both techniques have been used for classification based on a Artificial Neural Network implementation. The results have been analysed and the finding have demonstrated that a Discrete Cosine Transform based feature extraction yields a superior recognition than its counterpart

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3
    corecore