144 research outputs found
Offline Bengali writer verification by PDF-CNN and siamese net
© 2018 IEEE. Automated handwriting analysis is a popular area of research owing to the variation of writing patterns. In this research area, writer verification is one of the most challenging branches, having direct impact on biometrics and forensics. In this paper, we deal with offline writer verification on complex handwriting patterns. Therefore, we choose a relatively complex script, i.e., Indic Abugida script Bengali (or, Bangla) containing more than 250 compound characters. From a handwritten sample, the probability distribution functions (PDFs) of some handcrafted features are obtained and input to a convolutional neural network (CNN). For such a CNN architecture, we coin the term 'PDFCNN', where handcrafted feature PDFs are hybridized with auto-derived CNN features. Such hybrid features are then fed into a Siamese neural network for writer verification. The experiments are performed on a Bengali offline handwritten dataset of 100 writers. Our system achieves encouraging results, which sometimes exceed the results of state-of-The-Art techniques on writer verification
Writer identification approach based on bag of words with OBI features
Handwriter identification aims to simplify the task of forensic experts by providing them with semi-automated tools in order to enable them to narrow down the search to determine the final identification of an unknown handwritten sample. An identification algorithm aims to produce a list of predicted writers of the unknown handwritten sample ranked in terms of confidence measure metrics for use by the forensic expert will make the final decision.
Most existing handwriter identification systems use either statistical or model-based approaches. To further improve the performances this paper proposes to deploy a combination of both approaches using Oriented Basic Image features and the concept of graphemes codebook. To reduce the resulting high dimensionality of the feature vector a Kernel Principal Component Analysis has been used. To gauge the effectiveness of the proposed method a performance analysis, using IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting, has been carried out. The results obtained achieved an accuracy of 96% thus demonstrating its superiority when compared against similar techniques
Writer Identification on Multi-Script Handwritten Using Optimum Features
Recognizing the writer of a text that has been handwritten is a very intriguing research problem in the field of document analysis and recognition. This study tables an automatic way of recognizing the writer from handwritten samples. Even though much has been done in previous researches that have presented other various methods, it is still clear that the field has a room for improvement. This particular method uses Optimum Features based writer characterization. Here, each of the samples written is grouped according to their set of features that are acquired from a computed codebook. This proposed codebook is different from the others which segment the samples into graphemes by fragmenting a certain part of the writing known as ending strokes. The proposed technique is employed to a locate and extract the handwriting fragments from ending zone and then grouped the similar fragments to generate a new cluster known as ending cluster. The cluster that comes in handy in the process of coming up with the ending codebook through picking out the center of the same fragment group. The process is finalized by evaluating the proposed method on four datasets of the various languages. This method being proposed had an impressive 97.12% identification rate which is rates the best result on the ICFHR dataset
Offline Text-Independent Writer Identification based on word level data
This paper proposes a novel scheme to identify the authorship of a document
based on handwritten input word images of an individual. Our approach is
text-independent and does not place any restrictions on the size of the input
word images under consideration. To begin with, we employ the SIFT algorithm to
extract multiple key points at various levels of abstraction (comprising
allograph, character, or combination of characters). These key points are then
passed through a trained CNN network to generate feature maps corresponding to
a convolution layer. However, owing to the scale corresponding to the SIFT key
points, the size of a generated feature map may differ. As an alleviation to
this issue, the histogram of gradients is applied on the feature map to produce
a fixed representation. Typically, in a CNN, the number of filters of each
convolution block increase depending on the depth of the network. Thus,
extracting histogram features for each of the convolution feature map increase
the dimension as well as the computational load. To address this aspect, we use
an entropy-based method to learn the weights of the feature maps of a
particular CNN layer during the training phase of our algorithm. The efficacy
of our proposed system has been demonstrated on two publicly available
databases namely CVL and IAM. We empirically show that the results obtained are
promising when compared with previous works
Automatic handwriter identification using advanced machine learning
Handwriter identification a challenging problem especially for forensic investigation. This topic has received significant attention from the research community and several handwriter identification systems were developed for various applications including forensic science, document analysis and investigation of the historical documents. This work is part of an investigation to develop new tools and methods for Arabic palaeography, which is is the study of handwritten material, particularly ancient manuscripts with missing writers, dates, and/or places. In particular, the main aim of this research project is to investigate and develop new techniques and algorithms for the classification and analysis of ancient handwritten documents to support palaeographic studies.
Three contributions were proposed in this research. The first is concerned with the development of a text line extraction algorithm on colour and greyscale historical manuscripts. The idea uses a modified bilateral filtering approach to adaptively smooth the images while still preserving the edges through a nonlinear combination of neighboring image values. The proposed algorithm aims to compute a median and a separating seam and has been validated to deal with both greyscale and colour historical documents using different datasets. The results obtained suggest that our proposed technique yields attractive results when compared against a few similar algorithms.
The second contribution proposes to deploy a combination of Oriented Basic Image features and the concept of graphemes codebook in order to improve the recognition performances. The proposed algorithm is capable to effectively extract the most distinguishing handwriter’s patterns. The idea consists of judiciously combining a multiscale feature extraction with the concept of grapheme to allow for the extraction of several discriminating features such as handwriting curvature, direction, wrinkliness and various edge-based features. The technique was validated for identifying handwriters using both Arabic and English writings captured as scanned images using the IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained clearly demonstrate the effectiveness of the proposed method when compared against some similar techniques.
The third contribution is concerned with an offline handwriter identification approach based on the convolutional neural network technology. At the first stage, the Alex-Net architecture was employed to learn image features (handwritten scripts) and the features obtained from the fully connected layers of the model. Then, a Support vector machine classifier is deployed to classify the writing styles of the various handwriters. In this way, the test scripts can be classified by the CNN training model for further classification. The proposed approach was evaluated based on Arabic Historical datasets; Islamic Heritage Project (IHP) and Qatar National Library (QNL). The obtained results demonstrated that the proposed model achieved superior performances when compared to some similar method
Features selection for offline handwritten signature verification: State of the art
This research comes out with an in-depth review of widely used techniques to handwritten signature verification based, feature selection techniques. The focus of this research is to explore best features selection criteria for signature verification to avoid forgery. This paper further present pros and cons of local and global features selection techniques, reported in the state of art. Experiments are conducted on benchmark databases for signature verification systems (GPDS). Results are tested using two standard protocols; GPDS and the program for rate estimation and feature selection. The current precision of the signature verification techniques reported in state of art are compared on benchmark database and possible solutions are suggested to improve the accuracy. As the equal error rate is an important factor for evaluating the signature verification's accuracy, the results show that the feature selection methods have successfully contributed toward efficient signature verification
Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform
In this research, off-line handwriting recognition system for Arabic alphabet is
introduced. The system contains three main stages: preprocessing, segmentation and
recognition stage. In the preprocessing stage, Radon transform was used in the design
of algorithms for page, line and word skew correction as well as for word slant
correction. In the segmentation stage, Hough transform approach was used for line
extraction. For line to words and word to characters segmentation, a statistical method
using mathematic representation of the lines and words binary image was used.
Unlike most of current handwriting recognition system, our system simulates the
human mechanism for image recognition, where images are encoded and saved in
memory as groups according to their similarity to each other. Characters are
decomposed into a coefficient vectors, using fast wavelet transform, then, vectors,
that represent a character in different possible shapes, are saved as groups with one
representative for each group. The recognition is achieved by comparing a vector of
the character to be recognized with group representatives.
Experiments showed that the proposed system is able to achieve the recognition task
with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a
single character in a text of 15 lines where each line has 10 words on average
End-Shape Analysis for Automatic Segmentation of Arabic Handwritten Texts
Word segmentation is an important task for many methods that are related to document understanding especially word spotting and word recognition. Several approaches of word segmentation have been proposed for Latin-based languages while a few of them have been introduced for Arabic texts. The fact that Arabic writing is cursive by nature and unconstrained with no clear boundaries between the words makes the processing of Arabic handwritten text a more challenging problem.
In this thesis, the design and implementation of an End-Shape Letter (ESL) based segmentation system for Arabic handwritten text is presented. This incorporates four novel aspects: (i) removal of secondary components, (ii) baseline estimation, (iii) ESL recognition, and (iv) the creation of a new off-line CENPARMI ESL database.
Arabic texts include small connected components, also called secondary components. Removing these components can improve the performance of several systems such as baseline estimation. Thus, a robust method to remove secondary components that takes into consideration the challenges in the Arabic handwriting is introduced. The methods reconstruct the image based on some criteria. The results of this method were subsequently compared with those of two other methods that used the same database. The results show that the proposed method is effective.
Baseline estimation is a challenging task for Arabic texts since it includes ligature, overlapping, and secondary components. Therefore, we propose a learning-based approach that addresses these challenges. Our method analyzes the image and extracts baseline dependent features. Then, the baseline is estimated using a classifier.
Algorithms dealing with text segmentation usually analyze the gaps between connected components. These algorithms are based on metric calculation, finding threshold, and/or gap classification. We use two well-known metrics: bounding box and convex hull to test metric-based method on Arabic handwritten texts, and to include this technique in our approach. To determine the threshold, an unsupervised learning approach, known as the Gaussian Mixture Model, is used. Our ESL-based segmentation approach extracts the final letter of a word using rule-based technique and recognizes these letters using the implemented ESL classifier.
To demonstrate the benefit of text segmentation, a holistic word spotting system is implemented. For this system, a word recognition system is implemented. A series of experiments with different sets of features are conducted. The system shows promising results
Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform
In this research, off-line handwriting recognition system for Arabic alphabet is
introduced. The system contains three main stages: preprocessing, segmentation and
recognition stage. In the preprocessing stage, Radon transform was used in the design
of algorithms for page, line and word skew correction as well as for word slant
correction. In the segmentation stage, Hough transform approach was used for line
extraction. For line to words and word to characters segmentation, a statistical method
using mathematic representation of the lines and words binary image was used.
Unlike most of current handwriting recognition system, our system simulates the
human mechanism for image recognition, where images are encoded and saved in
memory as groups according to their similarity to each other. Characters are
decomposed into a coefficient vectors, using fast wavelet transform, then, vectors,
that represent a character in different possible shapes, are saved as groups with one
representative for each group. The recognition is achieved by comparing a vector of
the character to be recognized with group representatives.
Experiments showed that the proposed system is able to achieve the recognition task
with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a
single character in a text of 15 lines where each line has 10 words on average
- …