13 research outputs found

    Kannada Character Recognition System A Review

    Full text link
    Intensive research has been done on optical character recognition ocr and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market, but most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of works on Indian language character recognition especially Kannada script among 12 major scripts in India. This paper presents a review of existing work on printed Kannada script and their results. The characteristics of Kannada script and Kannada Character Recognition System kcr are discussed in detail. Finally fusion at the classifier level is proposed to increase the recognition accuracy.Comment: 12 pages, 8 figure

    A Lexicon of Connected Components for Arabic Optical Text Recognition

    Get PDF
    Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components. A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon

    A New Feature Extraction Method for TMNN-Based Arabic Character Classification

    Get PDF
    This paper describes a hybrid method of typewritten Arabic character recognition by Toeplitz Matrices and Neural Networks (TMNN) applying a new technique for feature selecting and data mining. The suggested algorithm reduces the NN input data to only the most significant and essential-for-classification points. Four items are determined to resemble the distribution percentage of the essential feature points in each part of the extracted character image. Feature points are detected depending on a designed algorithm for this aim. This algorithm is of high performance and is intelligent enough to define the most significant points which satisfy the sufficient conditions to recognize almost all written fonts of Arabic characters. The number of essential feature points is reduced by at least 88 %. Calculations and data size are then consequently decreased in a high percentage. The authors achieved a recognition rate of 97.61 %. The obtained results have proved high accuracy, high speed and powerful classification

    Applying Genetic Algorithm in Multi Language\u27s Characters Recognition

    Get PDF

    Optical character recognition of printed Odia documents

    Get PDF
    Optical Character Recognition (OCR) is a document image analysis method that involves the mechanical or electronic transformation of scanned or photographed images of typewritten or printed text into text that can be easily read by the computer. OCR has been become a very widespread area of interest and research because of its ability to narrow the reading ability gap between computers and humans and because it improves human machine interaction in many applications. Example applications include cheque verification, and a large variety of banking, business and data entry applications. The project involved skew correction of odia documents, line segmentation and eventual segmentation of odia characters. The project involved segmentation of a document into its constituent lines, then treating the line as one entity, it segmented the words. Now, once the words are segmented, the characters are extracted one by one. The algorithms used here stand true for all the devnagri scripts. Hence examples of telgu word segmentation is also done just to show as an proof of the applied algorithm

    Kurdish Optical Character Recognition

    Get PDF
    Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average
    corecore