5 research outputs found

    Alpha-Numerical Sequences Extraction in Handwritten Documents

    Full text link
    International audienceIn this paper, we introduce an alpha-numerical sequences extraction system (keywords, numerical fields or alpha-numerical sequences) in unconstrained handwritten documents. Contrary to most of the approaches presented in the literature, our system relies on a global handwriting line model describing two kinds of information : i) the relevant information and ii) the irrelevant information represented by a shallow parsing model. The shallow parsing of isolated text lines allows quick information extraction in any document while rejecting at the same time irrelevant information. Results on a public french incoming mails database show the efficiency of the approach

    Preprocessing Algorithm for Deciphering Historical Inscriptions Using String Metric

    Get PDF
    The article presents the improvements in the preprocessing part of the deciphering method (shortly preprocessing algorithm) for historical inscriptions of unknown origin. Glyphs used in historical inscriptions changed through time; therefore, various versions of the same script may contain different glyphs for each grapheme. The purpose of the preprocessing algorithm is reducing the running time of the deciphering process by filtering out the less probable interpretations of the examined inscription. However, the first version of the preprocessing algorithm leads incorrect outcome or no result in the output in certain cases. Therefore, its improved version was developed to find the most similar words in the dictionary by relaying the search conditions more accurately, but still computationally effectively. Moreover, a sophisticated similarity metric used to determine the possible meaning of the unknown inscription is introduced. The results of the evaluations are also detailed

    A syntax-directed method for numerical field extraction using classifier combination

    No full text
    International audienceIn this article, we propose a method for the automatic extraction of numerical fields in handwritten documents. The method exploits the syntax of a numerical field as an a priori knowledge to extract the connected component sequences from the document. For that, we have to label the connected components as “belonging to a numerical field” or not. We propose a method for discriminating the connected components, using different families of features and a combination of classifiers. A comparison between the results obtained with the combination of classifiers and our first approach [10] demonstrates the utility of combining different feature sets for discriminating classes with large intra-class variability
    corecore