Search CORE

222 research outputs found

RAPID ANALYTICAL VERIFICATION OF HANDWRITTEN ALPHANUMERIC ADDRESS FIELDS

Author: Lee C.K.
Leedham G.
Publication venue: Nijmegen: International Unipen Foundation
Publication date: 01/01/2004
Field of study

Microsoft, Motorola, Siemens, Hitachi, IAPR, NICI, IUF This paper presents a combination of fuzzy system and dynamic analytical model to deal with imprecise data derived from feature extraction in handwritten address images which are compared against postulated addresses for address verification. A dynamic buildingnumber locator is able to locate and recognise the buildingnumber, without knowing exactly where the buildingnumber starts in the candidate address line. The overall system achieved a correct sorting rate of 72.9%, 27.1% rejection rate and 0.0% error rate on a blind test set of 450 cursive handwritten addresses.

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen

Query by String word spotting based on character bi-gram indexing

Author: Ghosh Suman K.
Valveny Ernest
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/05/2015
Field of study

In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

arXiv.org e-Print Archive

Crossref

Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

Author: Belaid Abdel
Ghanmi Nabil
Publication venue: HAL CCSD
Publication date: 11/04/2016
Field of study

International audienceThis paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields

INRIA a CCSD electronic archive server

Hybrid of HMM and Fuzzy Logic for Isolated Handwritten Character Recognition

Author: Azizah Suliman
Publication venue: 'IntechOpen'
Publication date: 17/08/2010
Field of study

IntechOpen

Crossref

A System for an automatic reading of student information sheets

Author: Belaïd Abdel
Kacem Afef
Saïdani Asma
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2011
Field of study

ISBN: 978-1-4577-1350-7International audienceIn this paper we present a student information sheet reading system. Relevant algorithm is proposed to locate and label handwritten answer field. As information sheets can be filled in Arabic and/or in French, automating the script language differentiation is a pre-recognition required in the proposed system. We have developed a robust and fast field classification and script language identification method, based on a decision tree, to make these processing practical for sheet recognition. To this end, the system uses several novel features (loops, descenders, diacritics) and analyses the lower profile of script. The classification rates are 92.5% for numeric fields, 94.34% for Arabic scripts and 94.66% for French scripts. Experimental results, carried on 80 sheets, show our system provides an effective way to convert printed sheets into computerized format or collect information for database from printed sheets

Crossref

INRIA a CCSD electronic archive server

Handwritten Digit Recognition and Classification Using Machine Learning

Author: Zhao Ke
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2018
Field of study

In this paper, multiple learning techniques based on Optical character recognition (OCR) for the handwritten digit recognition are examined, and a new accuracy level for recognition of the MNIST dataset is reported. The proposed framework involves three primary parts, image pre-processing, feature extraction and classification. This study strives to improve the recognition accuracy by more than 99% in handwritten digit recognition. As will be seen, pre-processing and feature extraction play crucial roles in this experiment to reach the highest accuracy

Arrow@TUDublin

An Integrated architecture for recognition of totally unconstrained handwritten numerals

Author
Publication venue: Productivity From Information Technology, "PROFIT" Research Initiative, Sloan School of Management, Massachusetts Institute of Technology
Publication date: 01/01/1993
Field of study

Reprint. Reprinted from the International journal of pattern recognition and artificial intelligence. Vol. 7, no. 4 (1993) "January 1993."Includes bibliographical references (p. 127-128).Supported by the Productivity From Information Technology (PROFIT) Research Initiative at MIT.Amar Gupta ... [et al.

DSpace@MIT

A Tale of Two Transcriptions : Machine-Assisted Transcription of Historical Sources

Author: Andersen Trygve
Cabré Anna,
Eikvil Line
Fornes Bisquerra Alicia
Lladós Josep
Pujadas-Mora Joana Maria
Thorvaldsen Gunnar
Publication venue
Publication date: 01/01/2015
Field of study

This article is part of the "Norwegian Historical Population Register" project financed by the Norwegian Research Council (grant # 225950) and the Advanced Grand Project "Five Centuries of Marriages"(2011-2016) funded by the European Research Council (# ERC 2010-AdG_20100407)This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world's longest series of preserved vital records. Thus, in the Project "Five Centuries of Marriages" (5CofM) at the Autonomous University of Barcelona's Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Diposit Digital de Documents de la UAB

Arbitrary Keyword Spotting in Handwritten Documents

Author: Haji Mehdi
Publication venue
Publication date: 17/04/2012
Field of study

Despite the existence of electronic media in today’s world, a considerable amount of written communications is in paper form such as books, bank cheques, contracts, etc. There is an increasing demand for the automation of information extraction, classification, search, and retrieval of documents. The goal of this research is to develop a complete methodology for the spotting of arbitrary keywords in handwritten document images. We propose a top-down approach to the spotting of keywords in document images. Our approach is composed of two major steps: segmentation and decision. In the former, we generate the word hypotheses. In the latter, we decide whether a generated word hypothesis is a specific keyword or not. We carry out the decision step through a two-level classification where first, we assign an input image to a keyword or non-keyword class; and then transcribe the image if it is passed as a keyword. By reducing the problem from the image domain to the text domain, we do not only address the search problem in handwritten documents, but also the classification and retrieval, without the need for the transcription of the whole document image. The main contribution of this thesis is the development of a generalized minimum edit distance for handwritten words, and to prove that this distance is equivalent to an Ergodic Hidden Markov Model (EHMM). To the best of our knowledge, this work is the first to present an exact 2D model for the temporal information in handwriting while satisfying practical constraints. Some other contributions of this research include: 1) removal of page margins based on corner detection in projection profiles; 2) removal of noise patterns in handwritten images using expectation maximization and fuzzy inference systems; 3) extraction of text lines based on fast Fourier-based steerable filtering; 4) segmentation of characters based on skeletal graphs; and 5) merging of broken characters based on graph partitioning. Our experiments with a benchmark database of handwritten English documents and a real-world collection of handwritten French documents indicate that, even without any word/document-level training, our results are comparable with two state-of-the-art word spotting systems for English and French documents

Concordia University Research Repository