173 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images

    Get PDF
    There are two types of information in each handwritten word image: explicit information which can be easily read or derived directly, such as lexical content or word length, and implicit attributes such as the author's identity. Whether features learned by a neural network for one task can be used for another task remains an open question. In this paper, we present a deep adaptive learning method for writer identification based on single-word images using multi-task learning. An auxiliary task is added to the training process to enforce the emergence of reusable features. Our proposed method transfers the benefits of the learned features of a convolutional neural network from an auxiliary task such as explicit content recognition to the main task of writer identification in a single procedure. Specifically, we propose a new adaptive convolutional layer to exploit the learned deep features. A multi-task neural network with one or several adaptive convolutional layers is trained end-to-end, to exploit robust generic features for a specific main task, i.e., writer identification. Three auxiliary tasks, corresponding to three explicit attributes of handwritten word images (lexical content, word length and character attributes), are evaluated. Experimental results on two benchmark datasets show that the proposed deep adaptive learning method can improve the performance of writer identification based on single-word images, compared to non-adaptive and simple linear-adaptive approaches.Comment: Under view of Pattern Recognitio

    A study on idiosyncratic handwriting with impact on writer identification

    Full text link
    © 2018 IEEE. In this paper, we study handwriting idiosyncrasy in terms of its structural eccentricity. In this study, our approach is to find idiosyncratic handwritten text components and model the idiosyncrasy analysis task as a machine learning problem supervised by human cognition. We employ the Inception network for this purpose. The experiments are performed on two publicly available databases and an in-house database of Bengali offline handwritten samples. On these samples, subjective opinion scores of handwriting idiosyncrasy are collected from handwriting experts. We have analyzed the handwriting idiosyncrasy on this corpus which comprises the perceptive ground-truth opinion. We also investigate the effect of idiosyncratic text on writer identification by using the SqueezeNet. The performance of our system is promising

    Offline Bengali writer verification by PDF-CNN and siamese net

    Get PDF
    © 2018 IEEE. Automated handwriting analysis is a popular area of research owing to the variation of writing patterns. In this research area, writer verification is one of the most challenging branches, having direct impact on biometrics and forensics. In this paper, we deal with offline writer verification on complex handwriting patterns. Therefore, we choose a relatively complex script, i.e., Indic Abugida script Bengali (or, Bangla) containing more than 250 compound characters. From a handwritten sample, the probability distribution functions (PDFs) of some handcrafted features are obtained and input to a convolutional neural network (CNN). For such a CNN architecture, we coin the term 'PDFCNN', where handcrafted feature PDFs are hybridized with auto-derived CNN features. Such hybrid features are then fed into a Siamese neural network for writer verification. The experiments are performed on a Bengali offline handwritten dataset of 100 writers. Our system achieves encouraging results, which sometimes exceed the results of state-of-The-Art techniques on writer verification

    An empirical study on writer identification and verification from intra-variable individual handwriting

    Full text link
    © 2013 IEEE. The handwriting of a person may vary substantially with factors, such as mood, time, space, writing speed, writing medium/tool, writing a topic, and so on. It becomes challenging to perform automated writer verification/identification on a particular set of handwritten patterns (e.g., speedy handwriting) of an individual, especially when the system is trained using a different set of writing patterns (e.g., normal speed) of that same person. However, it would be interesting to experimentally analyze if there exists any implicit characteristic of individuality which is insensitive to high intra-variable handwriting. In this paper, we study some handcrafted features and auto-derived features extracted from intra-variable writing. Here, we work on writer identification/verification from highly intra-variable offline Bengali writing. To this end, we use various models mainly based on handcrafted features with support vector machine and features auto-derived by the convolutional network. For experimentation, we have generated two handwritten databases from two different sets of 100 writers and enlarged the dataset by a data-augmentation technique. We have obtained some interesting results

    Benchmark Classification of Handwritten Dataset by New Operator

    Get PDF
    In recent years, many new classifiers and feature extraction algorithms were proposed and tested on various OCR databases and these techniques were used in wide applications. Various systematic papers and inventions in OCR were reported in the literature. We can say that OCR is one of the most important and active research areas in the pattern recognition. Today, research OCR is dealing with diverse a character of complex problems. Important research in OCR includes the text degraded (heavy noise) and analysis/recognition of complex documents (including texts, images, graphs, tables and video documents). In this proposed system we are suing a new operator Recognition of Devnagari handwritten Characters one of the biggest problem in present scenario. Devnagari characters are not recognized efficiently and truthfully by electronic device. Many researchers and algorithm have been proposed for recognizing of characters. For recognizing of characters, many processes have to be performed but no single technique or algorithm can perform that recognition and give more accurate result. objective of this dissertation work is to propose a new operator, the name of this operator is Kirsch Operator and algorithm for getting accurate result

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals

    Full text link
    Interpretation of different writing styles, unconstrained cursiveness and relationship between different primitive parts is an essential and challenging task for recognition of handwritten characters. As feature representation is inadequate, appropriate interpretation/description of handwritten characters seems to be a challenging task. Although existing research in handwritten characters is extensive, it still remains a challenge to get the effective representation of characters in feature space. In this paper, we make an attempt to circumvent these problems by proposing an approach that exploits the robust graph representation and spectral graph embedding concept to characterise and effectively represent handwritten characters, taking into account writing styles, cursiveness and relationships. For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset. The experimental results demonstrate promising findings, which can be used in future studies.Comment: 16 pages, 8 figure
    corecore