3,781 research outputs found

    Adaptive dissection based subword segmentation of printed Arabic text

    Get PDF
    Numerous segmentation and recognition techniques have been proposed in literature for Arabic OCR system. Correct and efficient segmentation of Arabic text into characters is considered to be a fundamental problem. While OCR systems for other languages do not need segmentation for printed text for successful recognition, it is essential to design robust and powerful segmentation algorithms or employ segmentation free recognition schemes for printed Arabic text. Even more, in recognition of handwritten characters, segmentation is considered to be indispensable. Most of current segmentation technique suffers from over segmentation and under segmentation in addition to not being adaptive in nature. In this paper, we have proposed a new sub-word segmentation scheme, which is independent of font size and font type

    Adaptive dissection based subword segmentation of printed Arabic text

    Get PDF
    Numerous segmentation and recognition techniques have been proposed in literature for Arabic OCR system. Correct and efficient segmentation of Arabic text into characters is considered to be a fundamental problem. While OCR systems for other languages do not need segmentation for printed text for successful recognition, it is essential to design robust and powerful segmentation algorithms or employ segmentation free recognition schemes for printed Arabic text. Even more, in recognition of handwritten characters, segmentation is considered to be indispensable. Most of current segmentation technique suffers from over segmentation and under segmentation in addition to not being adaptive in nature. In this paper, we have proposed a new sub-word segmentation scheme, which is independent of font size and font type

    ISSUES AND PROBLEMS IN THE RECOGNITION OF ARABIC PRINTED TEXTS

    Get PDF
    Nowadays, Arabic text recognition bears witness to a wave of interest after a long period of moderate activity. The reason is the complexity of the problem manifested in both cursive shapes and close similarity of Arabic characters. Optical character recognition this is performed usually by detecting and quantifying isolated characters, which implies that the text is meaningfully segmented into more simple shapes. In this paper we study the properties of the Arabic script and review the problems encountered in its segmentation. To pass by the need for segmentation a new technique, the so-called N-markers, is proposed. It unifies the advantages of both global and structural recognition methods and is intuitively close to the human recognition process. The technique is tailored to single-font printed texts rich in ligatures, a problem encountered in good quality books and journals. It can be extended, in a straightforward way, to other fonts and also to handle degraded texts. Preliminary experiments show encouraging results

    Junction Point Detection And Identification Of Broken Character In Touching Arabic Handwritten Text Using Overlapping Set Theory

    Get PDF
    Touching characters are formed when two or more characters share the same space with each other. Therefore, segmentation of these touching character is very challenging research topic especially for handwritten Arabic degraded documents. This is one of the key issue in recognition of the handwritten Arabic text. In order to make the recognition system more effective segmentation of these touching handwritten Arabic characters is considered to be very important research area. In this research, a new method is proposed, which is used to identify the junction or common point of Arabic touching word image by applying overlapping or intersection set theory operation, which will help to trace the correct boundary of the touching characters, identify the broken characters and also segmented these touching handwritten text in an efficient way. The proposed method has been evaluated on Arabic touching handwritten characters taken from handwritten datasets. The results show the efficiency of the proposed method. The proposed method is applicable to both degraded handwritten documents and printed documents

    New Distance Measures for Arabic Handwritten Text Recognition

    Get PDF
    recent years, optical character recognition has attracted scientists and researchers. Latin, Chinese, Korean and Thai characters have been researched more thoroughly than Arabic characters. The research has concentrated firstly on printed and typeset characters until acceptable recognition accuracy has been achieved. Nowadays, most of the researches have gone towards handwritten character recognition. Arabic text is cursive as characters in a sub-word are connected to each other. This makes the recognition process more complex and a segmentation procedure is required to separate the connected characters from each other before they can be recognized. Features extracted have to be chosen carefully since it has a very important role in the segmentation and recognition process. The recognition accuracy mostly depends on the classifier applied and the segmentation procedure. In this research work, a framework for recognizing the Arabic handwriting is presented. Two approaches have been proposed. The first approach has been designed to recognize the word as a whole to fit applications such as sorting postal mails and bank checks where the number of words or digits that need to be recognized is limited. The words may include country and city names written on postal mails, or some reserved words or amounts used on bank checks. The second approach represents the general case where any type of documents or handwritten text can be recognized by this approach. In both approaches, a preprocessing stage including image enhancement and normalization. The most significant features are extracted by implementing the Principal Components Analysis. A new segmentation-based approach is designed and implemented for the second approach to segment the text into characters, while no or simple segmentation procedure is performed in the first approach. The recognition step is performed by applying the nearest neighbor algorithm. Four different distance measures are used with the nearest neighbor, the first norm, second norm (Euclidean), and two new norms proposed called ENorm, EEuclidean. The two new norms proposed (ENorm, EEuclidean) are derived from the first and second norm respectively. The recognition accuracy is enhanced by using the two new norms proposed. The approaches have been tested as well, and a number of experiments have been discussed more thoroughly. The first approach is experimented by four datasets, which are sub-words containing two characters, sub-words containing three characters, Latin letters and Hindi digits which are used with Arabic language nowadays. The recognition accuracy is the attribute used for measurement, and an 8-fold cross validation technique is used to test this attribute. The average recognition accuracy is 94.8% for the digits, 78% for the three-character sub-words, 77% for the two-character sub-words and 67% for Latin letters. The second approach has achieved recognition accuracy of 73% without detecting dots and 77% with dot detection

    Component-based Segmentation of words from handwritten Arabic text

    Get PDF
    Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3
    corecore