133,771 research outputs found

    An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition

    Full text link
    Traditionally, the performance of ocr algorithms and systems is based on the recognition of isolated characters. When a system classifies an individual character, its output is typically a character label or a reject marker that corresponds to an unrecognized character. By comparing output labels with the correct labels, the number of correct recognition, substitution errors misrecognized characters, and rejects unrecognized characters are determined. Nowadays, although recognition of printed isolated characters is performed with high accuracy, recognition of handwritten characters still remains an open problem in the research arena. The ability to identify machine printed characters in an automated or a semi automated manner has obvious applications in numerous fields. Since creating an algorithm with a one hundred percent correct recognition rate is quite probably impossible in our world of noise and different font styles, it is important to design character recognition algorithms with these failures in mind so that when mistakes are inevitably made, they will at least be understandable and predictable to the person working with theComment: 6pages, 5 figure

    Back-Propagation Artificial Neural Network Techniques for~ Optical Character Recognition- A Survey

    Get PDF
    Character recognition, a digitization concept, is an important research area in the field of image processing, and pattern recognition. Optical character recognition is a method of digitizing printed texts so that they can be searched electronically, stored compactly and used in machine processes such as text-to-speech, and machine translation. This paper describes the techniques for converting a type or handwritten document into machine readable form using BackPropagation Artificial Neural Networks

    Old English Character Recognition Using Neural Networks

    Get PDF
    Character recognition has been capturing the interest of researchers since the beginning of the twentieth century. While the Optical Character Recognition for printed material is very robust and widespread nowadays, the recognition of handwritten materials lags behind. In our digital era more and more historical, handwritten documents are digitized and made available to the general public. However, these digital copies of handwritten materials lack the automatic content recognition feature of their printed materials counterparts. We are proposing a practical, accurate, and computationally efficient method for Old English character recognition from manuscript images. Our method relies on a modern machine learning model, Artificial Neural Networks, to perform character recognition based on individual character images cropped directly from the images of the manuscript pages. We propose model dimensionality reduction methods that improve accuracy and computational effectiveness. Our experimental results show that the model we propose outperforms current automatic text recognition techniques

    Handwritten and printed text separation in historical documents

    Get PDF
    Historical documents present many challenges for Optical Character Recognition Systems (OCR), especially documents of poor quality containing handwritten annotations, stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed or handwritten texts, printed and handwritten parts have to be separated before using the respective recognition system. This thesis addresses the problem of segmentation of handwritings and printings in historical Latin text documents. To alleviate the problem of lack of data containing handwritten and machine-printed components located on the same page or even overlapping each other as well as their pixel-wise annotations, the data synthesis method proposed in [12] was applied and new datasets were generated. The newly created images and their pixel-level labels were used to train Fully Convolutional Model (FCN) introduced in [5]. The newly trained model has shown better results in the separation of machine-printed and handwritten text in historical documents

    Simple Character Recognition

    Get PDF
    Tato práce se zabývá vyhledáním a rozpoznáváním textu v obraze. Rozebírá problematiku extrakce příznaků a jejich použití při strojovém učení. Popisuje postup při návrhu a implementaci jednoduché aplikace pro rozpoznávání znaků strojově psaného textu.This work deals with the process of text location and recognition in an image document. It discusses the matter of feature extraction and its usage in machine learning. Portion of this work is devoted to design and implementation of application for simple character recognition of machine printed text.

    Handwritten Text Recognition Using Convolutional Neural Network

    Full text link
    OCR (Optical Character Recognition) is a technology that offers comprehensive alphanumeric recognition of handwritten and printed characters at electronic speed by merely scanning the document. Recently, the understanding of visual data has been termed Intelligent Character Recognition (ICR). Intelligent Character Recognition (ICR) is the OCR module that can convert scans of handwritten or printed characters into ASCII text. ASCII data is the standard format for data encoding in electronic communication. ASCII assigns standard numeric values to letters, numeral, symbols, white-spaces and other characters. In more technical terms, OCR is the process of using an electronic device to transform 2-Dimensional textual information into machine-encoded text. Anything that contains text both machine written or handwritten can be scanned either through a scanner or just simply a picture of the text is enough for the recognition system to distinguish the text. The goal of this papers is to show the results of a Convolutional Neural Network model which has been trained on National Institute of Science and Technology (NIST) dataset containing over a 100,000 images. The network learns from the features extracted from the images and use it to generate the probability of each class to which the picture belongs to. We have achieved an accuracy of 90.54% with a loss of 2.53%.Comment: 6 pages, 15 figure

    EASTER: Efficient and Scalable Text Recognizer

    Full text link
    Recent progress in deep learning has led to the development of Optical Character Recognition (OCR) systems which perform remarkably well. Most research has been around recurrent networks as well as complex gated layers which make the overall solution complex and difficult to scale. In this paper, we present an Efficient And Scalable TExt Recognizer (EASTER) to perform optical character recognition on both machine printed and handwritten text. Our model utilises 1-D convolutional layers without any recurrence which enables parallel training with considerably less volume of data. We experimented with multiple variations of our architecture and one of the smallest variant (depth and number of parameter wise) performs comparably to RNN based complex choices. Our 20-layered deepest variant outperforms RNN architectures with a good margin on benchmarking datasets like IIIT-5k and SVT. We also showcase improvements over the current best results on offline handwritten text recognition task. We also present data generation pipelines with augmentation setup to generate synthetic datasets for both handwritten and machine printed text.Comment: 9 pages, fixed typos and minor edit

    Recognition of characters in document images using morphological operation

    Get PDF
    In this paper, we deal with the problem of document image rectification from image captured by digital cameras. The improvement on the resolution of digital camera sensors has brought more and more applications for non-contact text capture. It is widely used as a form of data entry from some sort of original paper data source, documents, sales receipts or any number of printed records. It is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. Unfortunately, perspective distortion in the resulting image makes it hard to properly identify the contents of the captured text using traditional optical character recognition (OCR) systems. In this work we propose a new technique; it is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. Optical character recognition, usually abbreviated as OCR is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR software detects and extracts each character in the text of a scanned image, and using the ASCII code set, which is the American Standard Code for Information Interchange, converts it into a computer recognizable character. Once each character has been converted, the whole document is saved as an editable text document with a highest accuracy rate of 99.5 per cent, although it is not always this accurate. The basic idea of Optical Character Recognition (OCR) is to classify optical patterns (often contained in a digital image) corresponding to alphanumeric or other characters
    corecore