705 research outputs found

    A Comparative study of Arabic handwritten characters invariant feature

    Get PDF
    This paper is practically interested in the unchangeable feature of Arabic handwritten character. It presents results of comparative study achieved on certain features extraction techniques of handwritten character, based on Hough transform, Fourier transform, Wavelet transform and Gabor Filter. Obtained results show that Hough Transform and Gabor filter are insensible to the rotation and translation, Fourier Transform is sensible to the rotation but insensible to the translation, in contrast to Hough Transform and Gabor filter, Wavelets Transform is sensitive to the rotation as well as to the translation

    Off-line Thai handwriting recognition in legal amount

    Get PDF
    Thai handwriting in legal amounts is a challenging problem and a new field in the area of handwriting recognition research. The focus of this thesis is to implement Thai handwriting recognition system. A preliminary data set of Thai handwriting in legal amounts is designed. The samples in the data set are characters and words of the Thai legal amounts and a set of legal amounts phrases collected from a number of native Thai volunteers. At the preprocessing and recognition process, techniques are introduced to improve the characters recognition rates. The characters are divided into two smaller subgroups by their writing levels named body and high groups. The recognition rates of both groups are increased based on their distinguished features. The writing level separation algorithms are implemented using the size and position of characters. Empirical experiments are set to test the best combination of the feature to increase the recognition rates. Traditional recognition systems are modified to give the accumulative top-3 ranked answers to cover the possible character classes. At the postprocessing process level, the lexicon matching algorithms are implemented to match the ranked characters with the legal amount words. These matched words are joined together to form possible choices of amounts. These amounts will have their syntax checked in the last stage. Several syntax violations are caused by consequence faulty character segmentation and recognition resulting from connecting or broken characters. The anomaly in handwriting caused by these characters are mainly detected by their size and shape. During the recovery process, the possible word boundary patterns can be pre-defined and used to segment the hypothesis words. These words are identified by the word recognition and the results are joined with previously matched words to form the full amounts and checked by the syntax rules again. From 154 amounts written by 10 writers, the rejection rate is 14.9 percent with the recovery processes. The recognition rate for the accepted amount is 100 percent

    Recognition of Arabic handwritten words

    Get PDF
    Recognizing Arabic handwritten words is a difficult problem due to the deformations of different writing styles. Moreover, the cursive nature of the Arabic writing makes correct segmentation of characters an almost impossible task. While there are many sub systems in an Arabic words recognition system, in this work we develop a sub system to recognize Part of Arabic Words (PAW). We try to solve this problem using three different approaches, implicit segmentation and two variants of holistic approach. While Rothacker found similar conclusions while this work is being prepared, we report the difficulty in locating characters in PAW using Scale Invariant Feature Transforms under the first approach. In the second and third approaches, we use holistic approach to recognize PAW using Support Vector Machine (SVM) and Active Shape Models (ASM). While there are few works that use SVM to recognize PAW, they use a small dataset; we use a large dataset and a different set of features. We also explain the errors SVM and ASM make and propose some remedies to these errors as future work

    Segmentation of Touching Component in Arabic Manuscripts

    Get PDF
    International audience— Touching components are connection zones occurring between text-lines or words of the same line and are one of the problems that make unconstrained handwritten text segmentation greatly hard. In this paper, we propose a recognition based method to separate these components once localized in Arabic manuscript images. It first identifies, for a given touching component, a similar model stored in a dictionary with its correct segmentation, using shape context descriptor and an interpolation function. Then, it segment the touching component based on the distance from the midpoints of the identified model's parts. Tests are performed using a database of touching components and two metrics: Manhattan and Euclidean distances. Experimental results show the effectiveness of the proposed segmentation method

    A multi-scale, multi-wavelength source extraction method: getsources

    Full text link
    We present a multi-scale, multi-wavelength source extraction algorithm called getsources. Although it has been designed primarily for use in the far-infrared surveys of Galactic star-forming regions with Herschel, the method can be applied to many other astronomical images. Instead of the traditional approach of extracting sources in the observed images, the new method analyzes fine spatial decompositions of original images across a wide range of scales and across all wavebands. It cleans those single-scale images of noise and background, and constructs wavelength-independent single-scale detection images that preserve information in both spatial and wavelength dimensions. Sources are detected in the combined detection images by following the evolution of their segmentation masks across all spatial scales. Measurements of the source properties are done in the original background-subtracted images at each wavelength; the background is estimated by interpolation under the source footprints and overlapping sources are deblended in an iterative procedure. In addition to the main catalog of sources, various catalogs and images are produced that aid scientific exploitation of the extraction results. We illustrate the performance of getsources on Herschel images by extracting sources in sub-fields of the Aquila and Rosette star-forming regions. The source extraction code and validation images with a reference extraction catalog are freely available.Comment: 31 pages, 27 figures, to be published in Astronomy & Astrophysic

    Offline Recognition of Malayalam and Kannada Handwritten Documents Using Deep Learning

    Get PDF
    For a variety of reasons, handwritten text can be digitalized. It is used in a variety of government entities, including banks, post offices, and archaeological departments. Handwriting recognition, on the other hand, is a difficult task as everyone has a different writing style. There are essentially two methods for handwritten recognition: a holistic and an analytic approach. The previous methods of handwriting recognition are time- consuming. However, as deep neural networks have progressed, the approach has become more straightforward than previous methods. Furthermore, the bulk of existing solutions are limited to a single language. To recognise multilanguage handwritten manuscripts offline, this work employs an analytic approach. It describes how to convert Malayalam and Kannada handwritten manuscripts into editable text. Lines are separated from the input document first. After that, word segmentation is performed. Finally, each word is broken down into individual characters. An artificial neural network is utilised for feature extraction and classification. After that, the result is converted to a word document

    A Printed PAW Image Database of Arabic Language for Document Analysis and Recognition

    Get PDF
    Document image analysis and recognition are important topics in the field of artificial intelligence. In this context, the availability of a database with good script samples is an important requirement for machine-learning processes. For Latin and Asian languages many suitable databases exist. However, there is a shortage of databases with Arabic samples. In this work, a new database of printed Arabic text is introduced. The new concept of collecting sub-words (PAWs) instead of words or individual character samples was adopted. These PAWs constitute all words in the Arabic language. The collected database consists of 83,056 images of PAWs extracted from approximately 550,000 different words. Each sample is presented in the database in five font types: Thuluth, Naskh, Andalusi, Typing Machine, and Kufi. In total, the database consists of 415,280 images. Moreover, ground truth information is included with each PAW image to describe its occurrence number, occurrence frequency, positions and the shapes of the characters. This paper presents a statistical analysis of the frequency of each PAW in the Arabic language
    • …
    corecore