6,643 research outputs found

    Zone Segmentation and Thinning based Algorithm for Segmentation of Devnagari Text

    Get PDF
    Character segmentation of handwritten documents is an challenging research topic due to its diverse application environment.OCR can be used for automated processing and handling of forms, old corrupted reports, bank cheques, postal codes and structures. Now Segmentation of a word into characters is one of the major challenge in optical character recognition. This is even more challenging when we segment characters in an offline handwritten document and the next hurdle is presence of broken ,touching and overlapped characters in devnagari script. So, in this paper we have introduced an algorithm that will segment both broken as well as touching characters in devnagari script. Now to segment these characters the algorithm uses both zone segmentation and thinning based techniques. We have used 85 words each for isolated, broken, touching and both broken as well as touching characters individually. Results achieved while segmentation of broken as well as touching are 96.2 % on an average

    Segmentation Of Touching Arabic Characters In Handwritten Documents By Overlapping Set Theory And Contour Tracing

    Get PDF
    Segmentation of handwritten words into characters is one of the challenging problem in the field of OCR. In presence of touching characters, make this problem more difficult and challenging. There are many obstacles/challenges in segmentation of touching Arabic handwritten text. Although researches are busy in solving the problem of segmentation of these touching characters but still there exist unsolved problems of segmentation of touching offline Arabic handwritten characters. This is due to large variety of characters and their shapes. So in this research, a new method for segmentation of touching Arabic Handwritten character has been developed. The main idea of the proposed method is to segment the touching characters by identifying the touching point by overlapping set theory and ending points of the Arabic word by applying some standard morphology operation methods. After identifying all the points, segmentation method is applied to trace the boundaries of characters to separate these touching characters. Experiments were conducted on touching characters taken from different data sets. The results show the accuracy of the proposed method

    SEGMENTATION OF TOUCHING CHARACTER PRINTED LANNA SCRIPT USING JUNCTION POINT

    Get PDF
    In the northern part of Thailand since 1802, Lanna characters were popular as ancient characters. The segmentation of printed documents in Lanna characters is a challenging problem, such as the partial overlapping of characters and touching characters. This paper focuses on only the touching characters such as touching between consonants and vowels. Segmentation method begins with the horizontal histogram and then vertical histogram for segmentation of text lines and characters, respectively. The results are characters consisted of correct clear characters, partial overlapping characters, and touching characters. The proposed method computes the left edge junction points and right edge junction points. Then find their maximum numbers and find the value of its row to separate consonant and vowel from touching. The trial over the text documents printed in Lanna characters can be processed with an accuracy of 95.81%

    Junction Point Detection And Identification Of Broken Character In Touching Arabic Handwritten Text Using Overlapping Set Theory

    Get PDF
    Touching characters are formed when two or more characters share the same space with each other. Therefore, segmentation of these touching character is very challenging research topic especially for handwritten Arabic degraded documents. This is one of the key issue in recognition of the handwritten Arabic text. In order to make the recognition system more effective segmentation of these touching handwritten Arabic characters is considered to be very important research area. In this research, a new method is proposed, which is used to identify the junction or common point of Arabic touching word image by applying overlapping or intersection set theory operation, which will help to trace the correct boundary of the touching characters, identify the broken characters and also segmented these touching handwritten text in an efficient way. The proposed method has been evaluated on Arabic touching handwritten characters taken from handwritten datasets. The results show the efficiency of the proposed method. The proposed method is applicable to both degraded handwritten documents and printed documents

    Segmentation Of Two Touching Handwritten Arabic Characters Using Overlapping Set Theory And Gradient Orientation

    Get PDF
    Image segmentation of offline Arabic handwritten documents is an active research area but requires efforts to segment image into regions compared to human vision, especially for degraded handwritten historical documents. Therefore, these valuable degraded handwritten documents attract researchers from all around the world but facing problems in segmentation of Arabic text because of overlapping and touching character. The overlapping and touching of character occurs by not following the standard rule of writing where, two or more characters share the same space and these touching characters are considered as one sub-word. At present many techniques are available for touching handwritten character segmentation by using the concept of connected components. These methods are easy to implement and provide high accuracy in some cases but they fail in many cases because some manual decision value is required to determine the correct segmentation path near junction point, which produce unstable character boundary. Besides, these methods are unstable when applied to handwritten characters having loops or circular path in both touching characters. In this case, the cut-point is located in incorrect place, which can lead to incorrect dividing path of a character boundary. The selection of path near junction point is one of the main challenge in segmentation of connected components. Currently, these methods contain many disadvantages usually implemented for only one layout and fonts types because of variation in writing. Apart from connected components methods, template based segmentation is another available method where several studies have been developed based on template creation for touching characters. The disadvantage is creating many templates for all possible touching types. Therefore, due to variation in writing connected components methods still unexplored especially for the cursive based handwriting like Arabic and Jawi. In this work, three objectives are highlighted, first is to identify junction point of touching image, second is to formulate direction near junction point and third is for segmentation of touching characters. The research methodology consists of three proposed ideas: junction point detection, formulate direction and segmentation stage. In junction point identification stage overlapping set theory is used to identify the segmentation point of the two touching characters. In formulate direction stage; gradient technique is used to formulate the right direction near junction point. In segmentation stage contour tracing technique is used to segment the two touching character into isolated characters. The three proposed methods were tested on IFN/ENIT, AHDB and IAM datasets. Experiments were conducted on finding of junction point where success rate is 93.3%, for the second proposed method, the success rate is 98% and last proposed segmentation method is 97.27%. In conclusion, the proposed segmentation method outperforms the existing research in term of accuracy. Proposed methods do not use any recognizer or template to control segmentation accuracy. Finally, the proposed segmentation method was again compared with state of the art methods, and it also gained better accuracy rate for degraded, non-degraded document images and the accuracy for the overall processes for AHDB is about 97.45% and 85.03% for IAM dataset

    SIMULASI DAN ANALISIS SEGMENTASI CITRA TULISAN TANGAN ANGKA YANG SALING BERSENTUHAN MENGGUNAKAN METODE ZHANG SUEN

    Get PDF
    ABSTRAKSI: Proses segmentasi merupakan suatu bagian yang sangat penting dalam analisis citra. Tidak hanya pada pengolahan citra objek gambar tetapi juga pada citra tulisan tangan. Akan tetapi seringkali segmentasi pada citra tulisan tangan hanya pada segmentasi kalimat dan segmentasi kata saja. Segmentasi citra tulisan tangan masih perlu dikembangkan hingga pada segmentasi tiap karakter huruf atau angka agar penggunaannya pada pengolahan citra lebih akurat dan lebih baik. Permasalahan untuk mengembangkan segmentasi tulisan tulisan tangan ini jika tulisan tersebut saling bersentuhan. Untuk memisahkan 2 karakter tulisan tangan yang saling bersentuhan tersebut diperlukannya suatu proses segmentasi yang bisa memisahkan karakter tersebut. Dalam Tugas Akhir ini dilakukan simulasi tentang segmentasi citra tulisan tangan yang saling bersentuhan menggunakan Algoritma Zhang Suen. Algoritma ini merupakan salah satu algoritma thinning. Tahap pada Tugas Akhir ini terdiri dari Thinning, ekstrasi fitur points untuk menentukan titik pemotongan, melakukan pemotongan. Pada Tugas Akhir ini akan berfokus pada analisis segmentasi tulisan tangan angka yang saling bersentuhan tidak hanya single touching tapi juga multi touching. Dari simulasi ini dihasilkan akurasi segmentasi angka yang bersentuhan pada Sigle Touching 79.599%dengan nilai Sigma = 1, dan Multi Touching 37.943% pada nilai Sigma yang sama. Nilai Sigma sangat mempengaruhi ketebalan tulisan tangan pada citra. Thinning Zhang suen masih memerlukan peningkatan agar mendapatkan ekstrasi fitur Points yang tepat. Waktu yang diperlukan untuk melakukan segmenmtasi rata-rata 2.70316 detik per karakter untuk Single Touching dan 2.51648 detik untuk Multi Touching.Kata Kunci : segmentasi, tulisan tangan,Thinning, algoritma Zhang Suen, single touching, multi touchingABSTRACT: the process of segmentation is a very important part in the analysis of the imagery. Not just on image object image processing but also on images of handwriting. But often the segmentation of handwritten image only on the sentence and Word segmentation segmentation. Segmentation of handwritten images still need to be developed to the segmentation of each characters letters or numbers so that its use on image processing more accurate and better. Problems to develop segmentation handwriting glyphs if the glyphs touched. To separate 2 characters handwriting touched the segmentation as a process that can separate character. In this final task done a simulation about segmentation image penmanship touched use algorithms zhang suen. These algorithms is one algorithms thinning. Its stages on duty this final consisting of thinning, ekstrasi features points to figure out the points cutting, slaughter. On duty this final will focus on analysis segmentation handwriting a figure touched not only single touching but also multi touching. From this simulation produced the best accuracy of touching numerical segmentation in the Single Touching 79.599%with the Sigma value = 1, and the Multiple Touching 37.943% on the value of Sigma. Sigma values greatly affects the thickness of the handwriting on the image. Thinning Zhang suen still require improvement in order to get the right features extraction Points. The time required to perform segmentation 2.70316 average seconds per character for single touching and 2.51648 seconds per character for multi touching. Keyword: segmentation, handwriting, Thinning, algorithm Zhang Suen, single touching, multi touchin

    A hybrid 2-D HMM and MLP OCR system for processing multi-font and low-quality English documents

    Get PDF
    This thesis presents a Hybrid 2- Direction (D) Hidden Markov Model (2-D HMM) and Multi-Layer Perceptron (MLP) OCR system for the recognition of Multi-font printed documents of varying qualities. It emphasizes on new methods proposed. First, a statistical analysis of the frequency of touching characters has been conducted, and some statistics of touching characters have been generated from real documents. Based on these statistical results which could be the first formal statistics on touching characters, a new classifier has been designed to recognize some frequent touching characters without segmentation. Second, a new hierarchical character classifier is presented to enhance character recognition accuracy. We group all characters into several categories according to character layout contextual information (Ascender, Descender and Center). Consequently we implement several independent classifiers to recognize the characters in each group. In addition, a 2-D HMM is included in the hierarchical classifier to improve the character recognition rate, and an automatic builder of special touching character HMM is also described in this thesis. (Abstract shortened by UMI.

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    ANN-based Innovative Segmentation Method for Handwritten text in Assamese

    Get PDF
    Artificial Neural Network (ANN) s has widely been used for recognition of optically scanned character, which partially emulates human thinking in the domain of the Artificial Intelligence. But prior to recognition, it is necessary to segment the character from the text to sentences, words etc. Segmentation of words into individual letters has been one of the major problems in handwriting recognition. Despite several successful works all over the work, development of such tools in specific languages is still an ongoing process especially in the Indian context. This work explores the application of ANN as an aid to segmentation of handwritten characters in Assamese- an important language in the North Eastern part of India. The work explores the performance difference obtained in applying an ANN-based dynamic segmentation algorithm compared to projection- based static segmentation. The algorithm involves, first training of an ANN with individual handwritten characters recorded from different individuals. Handwritten sentences are separated out from text using a static segmentation method. From the segmented line, individual characters are separated out by first over segmenting the entire line. Each of the segments thus obtained, next, is fed to the trained ANN. The point of segmentation at which the ANN recognizes a segment or a combination of several segments to be similar to a handwritten character, a segmentation boundary for the character is assumed to exist and segmentation performed. The segmented character is next compared to the best available match and the segmentation boundary confirmed
    • …
    corecore