6 research outputs found

    A Novel Approach for the Recognition of a wide Arabic Handwritten Word Lexicon

    Get PDF
    International audienceThis paper introduces a novel approach for the recognition of a wide vocabulary of Arabic handwritten words. Note that there is an essential difference between the global and analytic approaches in pattern recognition. While the global approach is limited to reduced vocabulary, the analytic approach succeeds to recognize a wide vocabulary but meets the problems of word segmentation especially for Arabic. Combining the neuronal approach with some linguistic characteristics of the Arabic, it is expected that we become able to recognize better and to handle a large vocabulary of Arabic handwritten words. The proposed approach invokes two transparent neuronal networks, TNN_1 and TNN_2, to respectively recognize roots, schemes and the elements of conjugation from the structural primitives of the words. The approach was evaluated using real examples from a data base established for this purpose. The results are promising, and suggestions for improvements are proposed

    A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV

    Get PDF
    This paper presents a contribution for the Arabic printed recognition. In fact, we are interested in the printed decomposable Arabic word recognition. The proposed system uses the analytical approach through the segmentation into characters to succeed to a generation of letter hypotheses as well as word hypotheses using a lexical verification in a pre-established dictionary of the language. Our proposed system SPARLV is able to put valid hypotheses of words thanks to the lexical verification

    Arabic natural language processing: handwriting recognition

    Get PDF
    International audienceThe automatic recognition of Arabic writing is a very young research discipline with very challenging and significant problems. Indeed, with the air of the Internet, of Multimedia, the recognition of Arabic is useful to contributing like its close disciplines, Latin writing recognition, speech recognition and Vision processing, in current applications around digital libraries, document security and in numerical data processing in general. Arabic is a Semitic language spoken and understood in various forms by millions of people throughout the Middle East and in Africa, and it is used by 234 million people worldwide. Furthermore, Arabic gave rise to several other alphabets like Farsi or Urdu increasing much the interest of this script. Farsi is the main language used in Iran and Afghanistan, and it is spoken by more than 110 million people, concerning also some people in Tajikistan, and Pakistan. Urdu is an Indo-Aryan language with about 104 million speakers. It is the national language of Pakistan and is closely related to Hindi, though a lot of Urdu vocabulary comes from Persian and Arabic, which is not the case for Hindi. Urdu has been written with a version of the Perso-Arabic script since the 12th century and is normally written in Nastaliq style

    A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon

    Get PDF
    International audienceRecently, we have investigated the use of Arabic linguistic knowledge to improve the recognition of wide Arabic word lexicon. A neural-linguistic approach was proposed to mainly deal with canonical vocabulary of decomposable words derived from tri-consonant healthy roots. The basic idea is to factorize words by their roots and schemes. In this direction, we conceived two neural networks TNN_R and TNN_S to respectively recognize roots and schemes from structural primitives of words. The proposal approach achieved promising results. In this paper, we will focus on how to reach better results in terms of accuracy and recognition rate. Current improvements concern especially the training stage. It is about 1) to benefit from word letters order 2) to consider "sisters letters" (having same features), 3) to supervise networks behaviours, 4) to split up neurons to save letter occurrences and 5) to solve observed ambiguities. Considering theses improvements, experiments carried on 1500 sized vocabulary show a significant enhancement: TNN_R (resp. TNN_S) top4 has gone up from 77% to 85.8% (resp. from 65% to 97.9%). Enlarging the vocabulary from 1000 to 1700 by 100 words, again confirmed the results without altering the networks stability
    corecore