1,506 research outputs found

    A hypothesize-and-verify framework for Text Recognition using Deep Recurrent Neural Networks

    Full text link
    Deep LSTM is an ideal candidate for text recognition. However text recognition involves some initial image processing steps like segmentation of lines and words which can induce error to the recognition system. Without segmentation, learning very long range context is difficult and becomes computationally intractable. Therefore, alternative soft decisions are needed at the pre-processing level. This paper proposes a hybrid text recognizer using a deep recurrent neural network with multiple layers of abstraction and long range context along with a language model to verify the performance of the deep neural network. In this paper we construct a multi-hypotheses tree architecture with candidate segments of line sequences from different segmentation algorithms at its different branches. The deep neural network is trained on perfectly segmented data and tests each of the candidate segments, generating unicode sequences. In the verification step, these unicode sequences are validated using a sub-string match with the language model and best first search is used to find the best possible combination of alternative hypothesis from the tree structure. Thus the verification framework using language models eliminates wrong segmentation outputs and filters recognition errors

    Handwritten OCR for Indic Scripts: A Comprehensive Overview of Machine Learning and Deep Learning Techniques

    Get PDF
    The potential uses of cursive optical character recognition, commonly known as OCR, in a number of industries, particularly document digitization, archiving, even language preservation, have attracted a lot of interest lately. In the framework of optical character recognition (OCR), the goal of this research is to provide a thorough understanding of both cutting-edge methods and the unique difficulties presented by Indic scripts. A thorough literature search was conducted in order to conduct this study, during which time relevant publications, conference proceedings, and scientific files were looked for up to the year 2023. As a consequence of the inclusion criteria that were developed to concentrate on studies only addressing Handwritten OCR on Indic scripts, 53 research publications were chosen as the process's outcome. The review provides a thorough analysis of the methodology and approaches employed in the chosen study. Deep neural networks, conventional feature-based methods, machine learning techniques, and hybrid systems have all been investigated as viable answers to the problem of effectively deciphering Indian scripts, because they are famously challenging to write. To operate, these systems require pre-processing techniques, segmentation schemes, and language models. The outcomes of this methodical examination demonstrate that despite the fact that Hand Scanning for Indic script has advanced significantly, room still exists for advancement. Future research could focus on developing trustworthy models that can handle a range of writing styles and enhance accuracy using less-studied Indic scripts. This profession may advance with the creation of collected datasets and defined standards

    A new hybrid convolutional neural network and eXtreme gradient boosting classifier for recognizing handwritten Ethiopian characters

    Get PDF
    Handwritten character recognition has been profoundly studied for many years in the field of pattern recognition. Due to its vast practical applications and financial implications, handwritten character recognition is still an important research area. In this research, the Handwritten Ethiopian Character Recognition (HECR) dataset has been prepared to train the model. The images in the HECR dataset were organized with more than one color pen RGB main spaces that have been size normalized to 28 × 28 pixels. The dataset is a combination of scripts (Fidel in Ethiopia), numerical representations, punctuations, tonal symbols, combining symbols, and special characters. These scripts have been used to write ancient histories, science, and arts of Ethiopia and Eritrea. In this study, a hybrid model of two super classifiers: Convolutional Neural Network (CNN) and eXtreme Gradient Boosting (XGBoost) is proposed for classification. In this integrated model, CNN works as a trainable automatic feature extractor from the raw images and XGBoost takes the extracted features as an input for recognition and classification. The output error rates of the hybrid model and CNN with a fully connected layer are compared. A 0.4630 and 0.1612 error rates are achieved in classifying the handwritten testing dataset images, respectively. Thus XGBoost as a classifier performs a better result than the traditional fully connected layer

    Digital Paleography: Using the Digital Representation of Jawi Manuscripts to Support Paleographic Analysis

    Get PDF
    Palaeography is the study of ancient handwritten manuscripts to date the age and to localize ancient and medieval scripts. It also deals with analysing the development of the letters shape. Ancient Jawi manuscripts are one of the least studiedarea. Nowadays, over 7789 known Jawi manuscripts are kept in custody of various libraries in Malaysia. Most of these manuscripts were undated with unknown authors and location of origin. Analysing the different types of writing styles and recognizing the manuscript illuminations can discover this important information. In this paper, we discuss the palaeographical analysis from the perspective of computer science and propose a general framework for that. This process involves investigation of Arabic influence on the Jawi manuscript writings, establishing the palaeographical type of the script, and classification of writing styles based on local and global Jawi image features

    Development of a Feature Extraction Technique for Online Character Recognition System

    Get PDF
    Character recognition has been a popular research area for many years because of its various application potentials. Some of its application areas are postal automation, bank cheque processing, automatic data entry, signature verification and so on. Nevertheless, recognition of handwritten characters is a problem that is currently gathering a lot of attention. It has become a difficult problem because of the high variability and ambiguity in the character shapes written by individuals. A lot of researchers have proposed many approaches to solve this complex problem but none has been able to solve the problem completely in all settings. Some of the problems encountered by researchers include selection of efficient feature extraction method, long network training time, long recognition time and low recognition accuracy. This paper developed a feature extraction technique for online character recognition system using hybrid of geometrical and statistical features. Thus, through the integration of geometrical and statistical features, insights were gained into new character properties, since these types of features were considered to be complementary. Keywords: Character recognition, Feature extraction, Geometrical Feature, Statistical Feature, Character
    • …
    corecore