40 research outputs found

    HMM-based Offline Recognition of Handwritten Words Crossed Out with Different Kinds of Strokes

    Get PDF
    In this work, we investigate the recognition of words that have been crossed-out by the writers and are thus degraded. The degradation consists of one or more ink strokes that span the whole word length and simulate the signs that writers use to cross out the words. The simulated strokes are superimposed to the original clean word images. We considered two types of strokes: wave-trajectory strokes created with splines curves and line-trajectory strokes generated with the delta-lognormal model of rapid line movements. The experiments have been performed using a recognition system based on hidden Markov models and the results show that the performance decrease is moderate for single writer data and light strokes, but severe for multiple writer data

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Solving alignment conflicts in handwritten document segmentation

    Get PDF
    Text line segmentation is necessary before performing character recognition . We present here an iterative method for extracting text lines in unconstrained handwritten documents which uses clues from laws of perceptual organisation . Alignments are build from anchor points, linking components under criteria such as proximity, similarity and direction continuity. Conflicts may appear due to overlapping or interwoven lines . A local procedure guided by a direction continuity criteria, first seeks to solve the conflict. It may be followed by a global procedure which is based on the configuration of the alignments and their perceptual quality.La segmentation en lignes d'un document est une étape nécessaire avant d'aborder la reconnaissance des caractÚres, symboles ou mots. Cet article présente une méthode itérative basée sur le groupement perceptif, adaptée aux documents manuscrits non contraints. A partir de points d'ancrage directionnels, les composantes satisfaisant aux critÚres de proximité, similarité et continuité de direction sont groupées pour former des alignements. Les conflits qui apparaissent du fait de l'imbrication des alignements ou du chevauchement des hampes et jambages sont résolus soit localement par application du critÚre de continuité de direction, soit globalement en examinant la configuration et la qualité des alignements

    Gender identification through handwriting: An online approach

    Get PDF
    The present study was designed to identify writer's gender trough online handwriting and drawing analysis. Two groups - one of 126 males (mean age 24.65, SD=2.45) and the other of 114 females (mean age 24.51, SD=2.50) participants were recruited in the experiment. They were asked to perform seven writing and drawing tasks utilizing a digitizing tablet and a special writing device. Seventeen writing features grouped into five categories have been considered. The experiment's results show that the set of considered features enable to discriminate between male and female writers investigating their performance while copying a house drawing (task 2), writing words in capital letters (task 3) and writing a complete sentence in cursive letters (task 7), in particular focusing on Ductus (number of strokes) and Time categories of writing features

    Enriching Historical Manuscripts: The Bovary Project

    Full text link
    International audienceIn this paper we describe the Bovary Project, a manuscripts digitization project of the famous French writer Gustave FLAUBERT's first great work, which should end in 2006 by providing an online access to an hypertextual edition of "Madame Bovary" drafts set. We rst develop the global context of this project, the main objectives, and then focus particularly on the document analysis problem. Finally we propose a new approach for the segmentation of handwritten documents

    Is On-Line Handwriting Gender-Sensitive? What Tells us a Combination of Statistical and Machine Learning Approaches

    No full text
    Handwriting is an everyday life human activity. It can be collected off-line by scanning sheets of paper. The resulting images can then be processed by a computer-based system. Thanks to digitizing tablets, handwriting can also be collected on-line. From the collected raw signals (pen position, pressure over time), the dynamics of the writing can be recovered. Since handwriting is unique for each individual, it can be considered as a biometric modality. Biometric systems predicting gender from off-line handwriting, have thus been recently proposed. However we observe that, in contrast to other modalities such as speech, it is not straightforward for a human being (even expert) to predict gender. In this study we explore the limits of automatic gender prediction from on-line handwriting collected from a young adults population, homogeneous in terms of age and education. Statistical analysis of on-line dynamic features can highlight differences between male and female groups [6]. In the present study, we focus on a sentence copying task, and provide statistically significant features to a classifier, based on a machine learning approach (SVMs). Since the dataset is relatively small (240 subjects), several evaluation frameworks are explored: cross validation (CV), bootstrap, and fixed train/test partitions. Accuracies obtained from fixed partitions range from 37% to 79%, while those estimated by CV and bootstrap are around 65%. This shows to our opinion the limits of the gender recognition task for our young adult population dataset
    corecore