222 research outputs found

    RAPID ANALYTICAL VERIFICATION OF HANDWRITTEN ALPHANUMERIC ADDRESS FIELDS

    Get PDF
    Microsoft, Motorola, Siemens, Hitachi, IAPR, NICI, IUF This paper presents a combination of fuzzy system and dynamic analytical model to deal with imprecise data derived from feature extraction in handwritten address images which are compared against postulated addresses for address verification. A dynamic building­number locator is able to locate and recognise the building­number, without knowing exactly where the building­number starts in the candidate address line. The overall system achieved a correct sorting rate of 72.9%, 27.1% rejection rate and 0.0% error rate on a blind test set of 450 cursive handwritten addresses.

    Query by String word spotting based on character bi-gram indexing

    Full text link
    In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

    Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

    Get PDF
    International audienceThis paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields

    A System for an automatic reading of student information sheets

    Get PDF
    ISBN: 978-1-4577-1350-7International audienceIn this paper we present a student information sheet reading system. Relevant algorithm is proposed to locate and label handwritten answer field. As information sheets can be filled in Arabic and/or in French, automating the script language differentiation is a pre-recognition required in the proposed system. We have developed a robust and fast field classification and script language identification method, based on a decision tree, to make these processing practical for sheet recognition. To this end, the system uses several novel features (loops, descenders, diacritics) and analyses the lower profile of script. The classification rates are 92.5% for numeric fields, 94.34% for Arabic scripts and 94.66% for French scripts. Experimental results, carried on 80 sheets, show our system provides an effective way to convert printed sheets into computerized format or collect information for database from printed sheets

    Handwritten Digit Recognition and Classification Using Machine Learning

    Get PDF
    In this paper, multiple learning techniques based on Optical character recognition (OCR) for the handwritten digit recognition are examined, and a new accuracy level for recognition of the MNIST dataset is reported. The proposed framework involves three primary parts, image pre-processing, feature extraction and classification. This study strives to improve the recognition accuracy by more than 99% in handwritten digit recognition. As will be seen, pre-processing and feature extraction play crucial roles in this experiment to reach the highest accuracy

    An Integrated architecture for recognition of totally unconstrained handwritten numerals

    Get PDF
    Reprint. Reprinted from the International journal of pattern recognition and artificial intelligence. Vol. 7, no. 4 (1993) "January 1993."Includes bibliographical references (p. 127-128).Supported by the Productivity From Information Technology (PROFIT) Research Initiative at MIT.Amar Gupta ... [et al.

    A Tale of Two Transcriptions : Machine-Assisted Transcription of Historical Sources

    Get PDF
    This article is part of the "Norwegian Historical Population Register" project financed by the Norwegian Research Council (grant # 225950) and the Advanced Grand Project "Five Centuries of Marriages"(2011-2016) funded by the European Research Council (# ERC 2010-AdG_20100407)This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world's longest series of preserved vital records. Thus, in the Project "Five Centuries of Marriages" (5CofM) at the Autonomous University of Barcelona's Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources

    Arbitrary Keyword Spotting in Handwritten Documents

    Get PDF
    Despite the existence of electronic media in today’s world, a considerable amount of written communications is in paper form such as books, bank cheques, contracts, etc. There is an increasing demand for the automation of information extraction, classification, search, and retrieval of documents. The goal of this research is to develop a complete methodology for the spotting of arbitrary keywords in handwritten document images. We propose a top-down approach to the spotting of keywords in document images. Our approach is composed of two major steps: segmentation and decision. In the former, we generate the word hypotheses. In the latter, we decide whether a generated word hypothesis is a specific keyword or not. We carry out the decision step through a two-level classification where first, we assign an input image to a keyword or non-keyword class; and then transcribe the image if it is passed as a keyword. By reducing the problem from the image domain to the text domain, we do not only address the search problem in handwritten documents, but also the classification and retrieval, without the need for the transcription of the whole document image. The main contribution of this thesis is the development of a generalized minimum edit distance for handwritten words, and to prove that this distance is equivalent to an Ergodic Hidden Markov Model (EHMM). To the best of our knowledge, this work is the first to present an exact 2D model for the temporal information in handwriting while satisfying practical constraints. Some other contributions of this research include: 1) removal of page margins based on corner detection in projection profiles; 2) removal of noise patterns in handwritten images using expectation maximization and fuzzy inference systems; 3) extraction of text lines based on fast Fourier-based steerable filtering; 4) segmentation of characters based on skeletal graphs; and 5) merging of broken characters based on graph partitioning. Our experiments with a benchmark database of handwritten English documents and a real-world collection of handwritten French documents indicate that, even without any word/document-level training, our results are comparable with two state-of-the-art word spotting systems for English and French documents
    • …
    corecore