133,771 research outputs found
An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition
Traditionally, the performance of ocr algorithms and systems is based on the
recognition of isolated characters. When a system classifies an individual
character, its output is typically a character label or a reject marker that
corresponds to an unrecognized character. By comparing output labels with the
correct labels, the number of correct recognition, substitution errors
misrecognized characters, and rejects unrecognized characters are determined.
Nowadays, although recognition of printed isolated characters is performed with
high accuracy, recognition of handwritten characters still remains an open
problem in the research arena. The ability to identify machine printed
characters in an automated or a semi automated manner has obvious applications
in numerous fields. Since creating an algorithm with a one hundred percent
correct recognition rate is quite probably impossible in our world of noise and
different font styles, it is important to design character recognition
algorithms with these failures in mind so that when mistakes are inevitably
made, they will at least be understandable and predictable to the person
working with theComment: 6pages, 5 figure
Back-Propagation Artificial Neural Network Techniques for~ Optical Character Recognition- A Survey
Character recognition, a digitization concept, is an important research area in the field of image processing, and pattern
recognition. Optical character recognition is a method of digitizing printed texts so that they can be searched
electronically, stored compactly and used in machine processes such as text-to-speech, and machine translation. This
paper describes the techniques for converting a type or handwritten document into machine readable form using BackPropagation
Artificial Neural Networks
Old English Character Recognition Using Neural Networks
Character recognition has been capturing the interest of researchers since the beginning of the twentieth century. While the Optical Character Recognition for printed material is very robust and widespread nowadays, the recognition of handwritten materials lags behind. In our digital era more and more historical, handwritten documents are digitized and made available to the general public. However, these digital copies of handwritten materials lack the automatic content recognition feature of their printed materials counterparts. We are proposing a practical, accurate, and computationally efficient method for Old English character recognition from manuscript images. Our method relies on a modern machine learning model, Artificial Neural Networks, to perform character recognition based on individual character images cropped directly from the images of the manuscript pages. We propose model dimensionality reduction methods that improve accuracy and computational effectiveness. Our experimental results show that the model we propose outperforms current automatic text recognition techniques
Handwritten and printed text separation in historical documents
Historical documents present many challenges for Optical Character Recognition Systems
(OCR), especially documents of poor quality containing handwritten annotations,
stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed
or handwritten texts, printed and handwritten parts have to be separated before using
the respective recognition system. This thesis addresses the problem of segmentation of
handwritings and printings in historical Latin text documents. To alleviate the problem
of lack of data containing handwritten and machine-printed components located on the
same page or even overlapping each other as well as their pixel-wise annotations, the data
synthesis method proposed in [12] was applied and new datasets were generated. The
newly created images and their pixel-level labels were used to train Fully Convolutional
Model (FCN) introduced in [5]. The newly trained model has shown better results in the
separation of machine-printed and handwritten text in historical documents
Simple Character Recognition
Tato práce se zabývá vyhledáním a rozpoznáváním textu v obraze. Rozebírá problematiku extrakce příznaků a jejich použití při strojovém učení. Popisuje postup při návrhu a implementaci jednoduché aplikace pro rozpoznávání znaků strojově psaného textu.This work deals with the process of text location and recognition in an image document. It discusses the matter of feature extraction and its usage in machine learning. Portion of this work is devoted to design and implementation of application for simple character recognition of machine printed text.
Handwritten Text Recognition Using Convolutional Neural Network
OCR (Optical Character Recognition) is a technology that offers comprehensive
alphanumeric recognition of handwritten and printed characters at electronic
speed by merely scanning the document. Recently, the understanding of visual
data has been termed Intelligent Character Recognition (ICR). Intelligent
Character Recognition (ICR) is the OCR module that can convert scans of
handwritten or printed characters into ASCII text. ASCII data is the standard
format for data encoding in electronic communication. ASCII assigns standard
numeric values to letters, numeral, symbols, white-spaces and other characters.
In more technical terms, OCR is the process of using an electronic device to
transform 2-Dimensional textual information into machine-encoded text. Anything
that contains text both machine written or handwritten can be scanned either
through a scanner or just simply a picture of the text is enough for the
recognition system to distinguish the text. The goal of this papers is to show
the results of a Convolutional Neural Network model which has been trained on
National Institute of Science and Technology (NIST) dataset containing over a
100,000 images. The network learns from the features extracted from the images
and use it to generate the probability of each class to which the picture
belongs to. We have achieved an accuracy of 90.54% with a loss of 2.53%.Comment: 6 pages, 15 figure
EASTER: Efficient and Scalable Text Recognizer
Recent progress in deep learning has led to the development of Optical
Character Recognition (OCR) systems which perform remarkably well. Most
research has been around recurrent networks as well as complex gated layers
which make the overall solution complex and difficult to scale. In this paper,
we present an Efficient And Scalable TExt Recognizer (EASTER) to perform
optical character recognition on both machine printed and handwritten text. Our
model utilises 1-D convolutional layers without any recurrence which enables
parallel training with considerably less volume of data. We experimented with
multiple variations of our architecture and one of the smallest variant (depth
and number of parameter wise) performs comparably to RNN based complex choices.
Our 20-layered deepest variant outperforms RNN architectures with a good margin
on benchmarking datasets like IIIT-5k and SVT. We also showcase improvements
over the current best results on offline handwritten text recognition task. We
also present data generation pipelines with augmentation setup to generate
synthetic datasets for both handwritten and machine printed text.Comment: 9 pages, fixed typos and minor edit
Recognition of characters in document images using morphological operation
In this paper, we deal with the problem of document image rectification from image captured by digital cameras. The improvement on the resolution of digital camera sensors has brought more and more applications for non-contact text capture. It is widely used as a form of data entry from some sort of original paper data source, documents, sales receipts or any number of printed records. It is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. Unfortunately, perspective distortion in the resulting image makes it hard to properly identify the contents of the captured text using traditional optical character recognition (OCR) systems. In this work we propose a new technique; it is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. Optical character recognition, usually abbreviated as OCR is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR software detects and extracts each character in the text of a scanned image, and using the ASCII code set, which is the American Standard Code for Information Interchange, converts it into a computer recognizable character. Once each character has been converted, the whole document is saved as an editable text document with a highest accuracy rate of 99.5 per cent, although it is not always this accurate. The basic idea of Optical Character Recognition (OCR) is to classify optical patterns (often contained in a digital image) corresponding to alphanumeric or other characters
- …