38 research outputs found

    Malayalam Handwritten Character Recognition using CNN Architecture

    Get PDF
    The process of encoding an input text image into a machine-readable format is called optical character recognition (OCR). The difference in characteristics of each language makes it difficult to develop a universal method that will have high accuracy for all languages. A method that produces good results for one language may not necessarily produce the same results for another language. OCR for printed characters is easier than handwritten characters because of the uniformity that exists in printed characters. While conventional methods find it hard to improve the existing methods, Convolutional Neural Networks (CNN) has shown drastic improvement in classification and recognition of other languages. However, there is no OCR model using CNN for Malayalam characters. Our proposed system uses a new CNN architecture for feature extraction and softmax layer for classification of characters. This eliminates manual designing of features that is used in the conventional methods. P-ARTS Kayyezhuthu dataset is used for training the CNN and an accuracy of 99.75% is obtained for the testing dataset meanwhile a collection of 40 real time input images yielded an accuracy of 95%

    Handwritten OCR for Indic Scripts: A Comprehensive Overview of Machine Learning and Deep Learning Techniques

    Get PDF
    The potential uses of cursive optical character recognition, commonly known as OCR, in a number of industries, particularly document digitization, archiving, even language preservation, have attracted a lot of interest lately. In the framework of optical character recognition (OCR), the goal of this research is to provide a thorough understanding of both cutting-edge methods and the unique difficulties presented by Indic scripts. A thorough literature search was conducted in order to conduct this study, during which time relevant publications, conference proceedings, and scientific files were looked for up to the year 2023. As a consequence of the inclusion criteria that were developed to concentrate on studies only addressing Handwritten OCR on Indic scripts, 53 research publications were chosen as the process's outcome. The review provides a thorough analysis of the methodology and approaches employed in the chosen study. Deep neural networks, conventional feature-based methods, machine learning techniques, and hybrid systems have all been investigated as viable answers to the problem of effectively deciphering Indian scripts, because they are famously challenging to write. To operate, these systems require pre-processing techniques, segmentation schemes, and language models. The outcomes of this methodical examination demonstrate that despite the fact that Hand Scanning for Indic script has advanced significantly, room still exists for advancement. Future research could focus on developing trustworthy models that can handle a range of writing styles and enhance accuracy using less-studied Indic scripts. This profession may advance with the creation of collected datasets and defined standards

    An IoT System for Converting Handwritten Text to Editable Format via Gesture Recognition

    Get PDF
    Evaluation of traditional classroom has led to electronic classroom i.e. e-learning. Growth of traditional classroom doesn’t stop at e-learning or distance learning. Next step to electronic classroom is a smart classroom. Most popular features of electronic classroom is capturing video/photos of lecture content and extracting handwriting for note-taking. Numerous techniques have been implemented in order to extract handwriting from video/photo of the lecture but still the deficiency of few techniques can be resolved, and which can turn electronic classroom into smart classroom. In this thesis, we present a real-time IoT system to convert handwritten text into editable format by implementing hand gesture recognition (HGR) with Raspberry Pi and camera. Hand Gesture Recognition (HGR) is built using edge detection algorithm and HGR is used in this system to reduce computational complexity of previous systems i.e. removal of redundant images and lecture’s body from image, recollecting text from previous images to fill area from where lecture’s body has been removed. Raspberry Pi is used to retrieve, perceive HGR and to build a smart classroom based on IoT. Handwritten images are converted into editable format by using OpenCV and machine learning algorithms. In text conversion, recognition of uppercase and lowercase alphabets, numbers, special characters, mathematical symbols, equations, graphs and figures are included with recognition of word, lines, blocks, and paragraphs. With the help of Raspberry Pi and IoT, the editable format of lecture notes is given to students via desktop application which helps students to edit notes and images according to their necessity

    Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

    Full text link
    Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.Comment: Oral presentation at International Conference on Document Analysis and Recognition (ICDAR) - 2019. For dataset, pre-trained networks and additional details, visit project page at http://ihdia.iiit.ac.in

    An efficient convolutional neural network based classifier to predict Tamil writer

    Get PDF
    Identification of Tamil handwritten calligraphies at different levels such as character, word and paragraph is complicated when compared to other western language scripts. None of the existing methods provides efficient Tamil handwriting writer identification (THWI). Also offline Tamil handwritten identification at different levels still offers many motivating challenges to researchers. This paper employs a deep learning algorithm for handwriting image classification. Deep learning has its own dimensions to generate new features from a limited set of training dataset. Convolutional Neural Networks (CNNs) is one of deep, feed-forward artificial neural network is applied to THWI. The dataset collection and classification phase of CNN enables data access and automatic feature generation. Since the number of parameters is significantly reduced, training time to THWI is proportionally reduced. Understandably, the CNNs produced much higher identification rate compared with traditional ANN at different levels of handwriting

    Deep Learning Based Real Time Devanagari Character Recognition

    Get PDF
    The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are many algorithms and models that can be used for this purpose. This project tries to implement and optimize a deep learning-based model which will be able to recognize Devanagari script’s characters in real time by analyzing the hand movements

    A deep learning approach for recognizing the cursive Tamil characters in palm leaf manuscripts

    Get PDF
    Tamil is an old Indian language with a large corpus of literature on palm leaves, and other constituents. Palm leaf manuscripts were a versatile medium for narrating medicines, literature, theatre, and other subjects. Because of the necessity for digitalization and transcription, recognizing the cursive characters found in palm leaf manuscripts remains an open problem. In this research, a unique Convolutional Neural Network (CNN) technique is utilized to train the characteristics of the palm leaf characters. By this training, CNN can classify the palm leaf characters significantly on training phase. Initially, a preprocessing technique to remove noise in the input image is done through morphological operations. Text Line Slicing segmentation scheme is used to segment the palm leaf characters. In feature processing, there are some major steps used in this study, which include text line spacing, spacing without obstacle, and spacing with an obstacle. Finally, the extracted cursive characters are given as input to the CNN technique for final classification. The experiments are carried out with collected cursive Tamil palm leaf manuscripts to validate the performance of the proposed CNN with existing deep learning techniques in terms of accuracy, precision, recall, etc. The results proved that the proposed network achieved 94% of accuracy, where existing ResNet achieved 88% of accuracy
    corecore