32 research outputs found

    A Technique for Character Segmentation in Middle zone of Handwritten Hindi words using Hybrid Approach

    Get PDF
    India is a country where people talk in multilingual and write in multi-script. Devanagari is one of the most popular scripts in India, which is used to write Hindi, Sanskrit, Sindhi, Marathi and Nepali Languages. This research work is performed on Hindi language. A large number of precious and essential documents are available in handwritten form, which needs to be converted into editable form. The existence of Optical Character Recognition (OCR) makes this task easier to convert handwritten text in editable form. Character segmentation is an important phase of OCR, which segment the characters from handwritten words. This enhances the accuracy of OCR system. In this paper a hybrid approach is used to segment the characters that contain single and multiple touching characters within a word. The proposed system is tested on a dataset of various handwritten words written by different writers. The dataset of proposed system contains more than 300 handwritten words in Hindi language. Accuracy of the proposed hybrid system is evaluated to 96% which is better than that of existing techniques

    Segmentation of touching characters in upper zone in printed Gurmukhi script

    Full text link
    A new technique for segmenting touching characters in upper zone of printed Gurmukhi script has been presented in this paper. The technique is based on the structural properties of the Gurmukhi script characters. Concavity and convexity of the characters has been studied and using top profile projections, the touching characters in upper zone have been segmented. Recognition rate of 91 % has been achieved for segmenting the touching characters in upper zone

    A Framework for Devanagari Script-based Captcha

    Full text link
    Human Interactive Proofs (HIPs) are automatic reverse Turing tests designed to distinguish between various groups of users. Completely Automatic Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a HIP system that distinguish between humans and malicious computer programs. Many CAPTCHAs have been proposed in the literature that text-graphical based, audio-based, puzzle-based and mathematical questions-based. The design and implementation of CAPTCHAs fall in the realm of Artificial Intelligence. We aim to utilize CAPTCHAs as a tool to improve the security of Internet based applications. In this paper we present a framework for a text-based CAPTCHA based on Devanagari script which can exploit the difference in the reading proficiency between humans and computer programs. Our selection of Devanagari script-based CAPTCHA is based on the fact that it is used by a large number of Indian languages including Hindi which is the third most spoken language. There is potential for an exponential rise in the applications that are likely to be developed in that script thereby making it easy to secure Indian language based applications.Comment: 10 pages, 8 Figures, CCSEA 2011 - First International Conference, Chennai, July 15-17, 201

    Deep Learning Based Real Time Devanagari Character Recognition

    Get PDF
    The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are many algorithms and models that can be used for this purpose. This project tries to implement and optimize a deep learning-based model which will be able to recognize Devanagari script’s characters in real time by analyzing the hand movements

    Network Approach based Hindi Numeral Recognition

    Get PDF
    Handwriting has kept on persevering as a methods for correspondence and recording data in everyday life even with the presentation of new advancements. The steady improvement of PC apparatuses prompt the necessity of less demanding interface between the man and the PC. Written by hand character acknowledgment may for example be connected to Postal division acknowledgment, programmed printed frame securing, or checks perusing. The significance to these applications has prompted extraordinary research for quite a while in the field of disconnected manually written character acknowledgment. 'Hindi' the national dialect of India (written in Devanagri content) is world's third most prevalent dialect after Chinese and English. Hindi manually written character acknowledgment has got parcel of utilization in various fields like postal address perusing, checks perusing electronically. Acknowledgment of written by hand Hindi characters by PC machine is convoluted errand when contrasted with composed characters, which can be effortlessly perceived by the PC. This paper exhibits a plan to perceive hindi number numeral with the assistance of neural network

    Feature Extraction Techniques for Marathi Character Classification using Neural Networks Models

    Get PDF
    Hand written Marathi Character Recognition is challenges to the researchers due to the complex structure. This paper presents a novel approach for recognition of unconstrained handwritten Marathi characters. The recognition is carried out using multiple feature extraction methods and classification scheme. The initial stages of feature extraction are based upon the pixel value features and the classification of the characters is done according to the structural parameters into 44 classes. The final stage of feature extraction makes use of the zoning features. First Pixel values are used as features and these values are further modified as another set of features. All these features are then applied to neural network for recognition. A separate neural network is built for each type of feature. The average recognition rate is found to be 67.96% , 82.67%,63,46% and 76.46% respectively for feed forward , radial basis , elman and pattern recognition neural networks for handwritten marathi characters

    Adaptive Algorithms for Automated Processing of Document Images

    Get PDF
    Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts
    corecore