472 research outputs found

    Offline Text-Independent Writer Identification based on word level data

    Full text link
    This paper proposes a novel scheme to identify the authorship of a document based on handwritten input word images of an individual. Our approach is text-independent and does not place any restrictions on the size of the input word images under consideration. To begin with, we employ the SIFT algorithm to extract multiple key points at various levels of abstraction (comprising allograph, character, or combination of characters). These key points are then passed through a trained CNN network to generate feature maps corresponding to a convolution layer. However, owing to the scale corresponding to the SIFT key points, the size of a generated feature map may differ. As an alleviation to this issue, the histogram of gradients is applied on the feature map to produce a fixed representation. Typically, in a CNN, the number of filters of each convolution block increase depending on the depth of the network. Thus, extracting histogram features for each of the convolution feature map increase the dimension as well as the computational load. To address this aspect, we use an entropy-based method to learn the weights of the feature maps of a particular CNN layer during the training phase of our algorithm. The efficacy of our proposed system has been demonstrated on two publicly available databases namely CVL and IAM. We empirically show that the results obtained are promising when compared with previous works

    Leveraging Expert Models for Training Deep Neural Networks in Scarce Data Domains: Application to Offline Handwritten Signature Verification

    Full text link
    This paper introduces a novel approach to leverage the knowledge of existing expert models for training new Convolutional Neural Networks, on domains where task-specific data are limited or unavailable. The presented scheme is applied in offline handwritten signature verification (OffSV) which, akin to other biometric applications, suffers from inherent data limitations due to regulatory restrictions. The proposed Student-Teacher (S-T) configuration utilizes feature-based knowledge distillation (FKD), combining graph-based similarity for local activations with global similarity measures to supervise student's training, using only handwritten text data. Remarkably, the models trained using this technique exhibit comparable, if not superior, performance to the teacher model across three popular signature datasets. More importantly, these results are attained without employing any signatures during the feature extraction training process. This study demonstrates the efficacy of leveraging existing expert models to overcome data scarcity challenges in OffSV and potentially other related domains

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data

    Information Preserving Processing of Noisy Handwritten Document Images

    Get PDF
    Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average

    Activity Report 2003

    Get PDF
    • …
    corecore