795 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Decision Fusion and Contextual Information for Arabic Words Recognition for Computing and Informatics

    Get PDF
    The study of multiple classifier systems has become recently an area of intensive research in pattern recognition. Also in handwriting recognition, systems combining several classifiers have been investigated. An approach for recognizing the legal amount for handwritten Arabic bank check is described in this article. The solution uses multiple information sources to recognize words. The recognition step is preformed with a parallel combination of three kinds of classifiers using holistic word structural features. The classification stage results are first normalized, and the sum combination is performed as a decision fusion scheme, after which a syntactic analyzer makes final decision on the candidate words. Using this approach, the obtained results are very interesting and promising

    Dissimilarity Gaussian Mixture Models for Efficient Offline Handwritten Text-Independent Identification using SIFT and RootSIFT Descriptors

    Get PDF
    Handwriting biometrics is the science of identifying the behavioural aspect of an individual’s writing style and exploiting it to develop automated writer identification and verification systems. This paper presents an efficient handwriting identification system which combines Scale Invariant Feature Transform (SIFT) and RootSIFT descriptors in a set of Gaussian mixture models (GMM). In particular, a new concept of similarity and dissimilarity Gaussian mixture models (SGMM and DGMM) is introduced. While a SGMM is constructed for every writer to describe the intra-class similarity that is exhibited between the handwritten texts of the same writer, a DGMM represents the contrast or dissimilarity that exists between the writer’s style on one hand and other different handwriting styles on the other hand. Furthermore, because the handwritten text is described by a number of key point descriptors where each descriptor generates a SGMM/DGMM score, a new weighted histogram method is proposed to derive the intermediate prediction score for each writer’s GMM. The idea of weighted histogram exploits the fact that handwritings from the same writer should exhibit more similar textual patterns than dissimilar ones, hence, by penalizing the bad scores with a cost function, the identification rate can be significantly enhanced. Our proposed system has been extensively assessed using six different public datasets (including three English, two Arabic and one hybrid language) and the results have shown the superiority of the proposed system over state-of-the-art techniques

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach

    Get PDF
    The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view

    Reconocimiento de notación matemática escrita a mano fuera de línea

    Get PDF
    El reconocimiento automático de expresiones matemáticas es uno de los problemas de reconocimiento de patrones, debido a que las matemáticas representan una fuente valiosa de información en muchos a ́reas de investigación. La escritura de expresiones matemáticas a mano es un medio de comunicación utilizado para la transmisión de información y conocimiento, con la cual se pueden generar de una manera sencilla escritos que contienen notación matemática. Este proceso puede volverse tedioso al ser escrito en lenguaje de composición tipográfica que pueda ser procesada por una computadora, tales como LATEX, MathML, entre otros. En los sistemas de reconocimiento de expresiones matem ́aticas existen dos m ́etodos diferentes a saber: fuera de l ́ınea y en l ́ınea. En esta tesis, se estudia el desempen ̃o de un sistema fuera de l ́ınea en donde se describen los pasos b ́asicos para lograr una mejor precisio ́n en el reconocimiento, las cuales esta ́n divididas en dos pasos principales: recono- cimiento de los s ́ımbolos de las ecuaciones matema ́ticas y el ana ́lisis de la estructura en que est ́an compuestos. Con el fin de convertir una expresi ́on matema ́tica escrita a mano en una expresio ́n equivalente en un sistema de procesador de texto, tal como TEX

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include
    corecore