112 research outputs found

    Human Reading Based Strategies for off-line Arabic Word Recognition

    Get PDF
    International audienceThis paper summarizes some techniques proposed for off-line Arabic word recognition. The point of view developed here concerns the human reading favoring an interactive mechanism between global memorization and local checking making easier the recognition of complex scripts as Arabic. According to this consideration, some specific papers are analyzed and their strategies commente

    Arabic Handwritten Words Off-line Recognition based on HMMs and DBNs

    Get PDF
    International audienceIn this work, we investigate the combination of PGM (Propabilistic Graphical Models) classifiers, either independent or coupled, for the recognition of Arabic handwritten words. The independent classifiers are vertical and horizontal HMMs (Hidden Markov Models) whose observable outputs are features extracted from the image columns and the image rows respectively. The coupled classifiers associate the vertical and horizontal observation streams into a single DBN (Dynamic Bayesian Network). A novel method to extract word baseline and a simple and easily extractable features to construct feature vectors for words in the vocabulary are proposed. Some of these features are statistical, based on pixel distributions and local pixel configurations. Others are structural, based on the presence of ascenders, descenders, loops and diacritic points. Experiments on handwritten Arabic words from IFN/ENIT strongly support the feasibility of the proposed approach. The recognition rates achieve 90.42% with vertical and horizontal HMM, 85.03% and 85.21% with respectively a first and a second DBN which outperform results of some works based on PGMs

    Use of Markov processes in writing recognition

    Get PDF
    In this paper, we present a brief survey on the use of different types of Markov models in writing recognition . Recognition is done by a posteriori pattern class probability calculus . This computation implies several terms which, according to the dependency hypotheses akin to the considered application, can be decomposed in elementary conditional probabilities . Under the assumption that the pattern may be modeled as a uni- or two-dimensional stochastic process (random field) presenting Markovian properties, local maximisations of these probabilities result in maximum pattern likelihood . We have studied throughout the article several cases of subpattern probability conditioning. Each case is accompanied by practical illustrations related to the field of writing recognition .Dans cet article, nous présentons une étude sur l'emploi de différents types de modèles de Markov en reconnaissance de l'écriture. La reconnaissance est obtenue par calcul de la probabilité a posteriori de la classe d'une forme. Ce calcul fait intervenir plusieurs termes qui, suivant certaines hypothèses de dépendance liées à l'application traitée, peuvent se décomposer en probabilités conditionnelles élémentaires. Si l'on suppose que la forme suit un processus stochastique uni- ou bidimensionnel qui de plus vérifie les propriétés de Markov, alors la maximisation locale de ces probabilités permet l'atteinte d'un maximum de la vraisemblance de la forme. Nous avons étudié plusieurs cas de conditionnement des probabilités élémentaires des sous-formes. Chaque étude est accompagnée d'illustrations pratiques relatives au domaine de la reconnaissance de l'écriture imprimée et/ou manuscrite

    Arabic natural language processing: handwriting recognition

    Get PDF
    International audienceThe automatic recognition of Arabic writing is a very young research discipline with very challenging and significant problems. Indeed, with the air of the Internet, of Multimedia, the recognition of Arabic is useful to contributing like its close disciplines, Latin writing recognition, speech recognition and Vision processing, in current applications around digital libraries, document security and in numerical data processing in general. Arabic is a Semitic language spoken and understood in various forms by millions of people throughout the Middle East and in Africa, and it is used by 234 million people worldwide. Furthermore, Arabic gave rise to several other alphabets like Farsi or Urdu increasing much the interest of this script. Farsi is the main language used in Iran and Afghanistan, and it is spoken by more than 110 million people, concerning also some people in Tajikistan, and Pakistan. Urdu is an Indo-Aryan language with about 104 million speakers. It is the national language of Pakistan and is closely related to Hindi, though a lot of Urdu vocabulary comes from Persian and Arabic, which is not the case for Hindi. Urdu has been written with a version of the Perso-Arabic script since the 12th century and is normally written in Nastaliq style

    Apprentissage dynamique du nombre d'états d'un modèle de markov caché à observations continues : Application au tri de formulaires

    Get PDF
    Dans le cadre de la reconnaissance automatique de types de formulaires avec champs manuscrits et sans aucun signe de référence, basée sur une description de la structure physique du formulaire, nous sommes amenés à représenter un formulaire par un modèle de Markov caché pseudo-2D (PHMM). Ce modèle est constitué d'un graphe de super-états. A chaque super-état on associe un modèle de Markov caché secondaire (HMM) dont les observations sont continues. Nous exposons pourquoi la méthode classique des k-moyennes est mal adaptée à notre problème, puis nous détaillons une nouvelle méthode générale qui prend mieux en compte la réalité physique des états, en les situant dans l'espace de représentation des caractéristiques, et en les construisant dynamiquement par agrégation progressive des séquences d'observations. Ce n'est qu'à la fin du processus d'agrégation que le nombre d'états du modèle stochastique initial est connu

    Structural Features Extraction for Handwritten Arabic Personal Names Recognition

    Get PDF
    International audienceDue to the nature of handwriting with high degree of variability and imprecision, obtaining features that represent words is a difficult task. In this research, a features extraction method for handwritten Arabic word recognition is investigated. Its major goal is to maximize the recognition rate with the least amount of elements. This method incorporates many characteristics of handwritten characters based on structural information (loops, stems, legs, diacritics). Experiments are performed on Arabic personal names extracted from registers of the national Tunisian archive and on some Tunisian city names of IFN-ENIT database. The obtained results presented are encouraging and open other perspectives in the domain of the features and classifiers selection of Arabic Handwritten word recognition

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average

    Predictive modeling using sparse logistic regression with applications

    Get PDF
    In this thesis, sparse logistic regression models are applied in a set of real world machine learning applications. The studied cases include supervised image segmentation, cancer diagnosis, and MEG data classification. Image segmentation is applied both in component detection in inkjet printed electronics manufacturing and in cell detection from microscope images. The results indicate that a simple linear classification method such as logistic regression often outperforms more sophisticated methods. Further, it is shown that the interpretability of the linear model offers great advantage in many applications. Model validation and automatic feature selection by means of L1 regularized parameter estimation have a significant role in this thesis. It is shown that a combination of a careful model assessment scheme and automatic feature selection by means of logistic regression model and coefficient regularization create a powerful, yet simple and practical, tool chain for applications of supervised learning and classification

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average
    corecore