450 research outputs found

    A Novel Two-Stage Spectrum-Based Approach for Dimensionality Reduction: A Case Study on the Recognition of Handwritten Numerals

    Get PDF
    Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers

    A fine-grained approach to scene text script identification

    Full text link
    This paper focuses on the problem of script identification in unconstrained scenarios. Script identification is an important prerequisite to recognition, and an indispensable condition for automatic text understanding systems designed for multi-language environments. Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images. We detail a novel method for script identification in natural images that combines convolutional features and the Naive-Bayes Nearest Neighbor classifier. The proposed framework efficiently exploits the discriminative power of small stroke-parts, in a fine-grained classification framework. In addition, we propose a new public benchmark dataset for the evaluation of joint text detection and script identification in natural scenes. Experiments done in this new dataset demonstrate that the proposed method yields state of the art results, while it generalizes well to different datasets and variable number of scripts. The evidence provided shows that multi-lingual scene text recognition in the wild is a viable proposition. Source code of the proposed method is made available online

    Dissimilarity Gaussian Mixture Models for Efficient Offline Handwritten Text-Independent Identification using SIFT and RootSIFT Descriptors

    Get PDF
    Handwriting biometrics is the science of identifying the behavioural aspect of an individual’s writing style and exploiting it to develop automated writer identification and verification systems. This paper presents an efficient handwriting identification system which combines Scale Invariant Feature Transform (SIFT) and RootSIFT descriptors in a set of Gaussian mixture models (GMM). In particular, a new concept of similarity and dissimilarity Gaussian mixture models (SGMM and DGMM) is introduced. While a SGMM is constructed for every writer to describe the intra-class similarity that is exhibited between the handwritten texts of the same writer, a DGMM represents the contrast or dissimilarity that exists between the writer’s style on one hand and other different handwriting styles on the other hand. Furthermore, because the handwritten text is described by a number of key point descriptors where each descriptor generates a SGMM/DGMM score, a new weighted histogram method is proposed to derive the intermediate prediction score for each writer’s GMM. The idea of weighted histogram exploits the fact that handwritings from the same writer should exhibit more similar textual patterns than dissimilar ones, hence, by penalizing the bad scores with a cost function, the identification rate can be significantly enhanced. Our proposed system has been extensively assessed using six different public datasets (including three English, two Arabic and one hybrid language) and the results have shown the superiority of the proposed system over state-of-the-art techniques

    Spatial and Textural Aspects for Arabic Handwritten Characters Recognition

    Get PDF
    The purpose of the present paper is the recognition of handwritten Arabic characters in their isolated form. The specificity of Arabic characters is taken into consideration, each of the proposed feature extraction method integrates one of the two aspects: spatial and textural. In the first step, a modified Bitmap Sampling method is proposed, which converts the character’s images into a binary Matrix and then constructs a Mask for each class. A matching rate is used between the input binary matrix and the masks to determinate the corresponding class. In the second step we investigate the use of an Artificial Neural Network as classifier with the binary matrices as features and then the histograms of Local Binary Patterns to capture the texture aspect of the characters. Finally, the results of these two methods are combined to take into consideration both aspects at the same time. Tested on the Arabic set of the Isolated Farsi Handwritten Character Database, the proposed method has 2.82% error rate

    Decision Fusion and Contextual Information for Arabic Words Recognition for Computing and Informatics

    Get PDF
    The study of multiple classifier systems has become recently an area of intensive research in pattern recognition. Also in handwriting recognition, systems combining several classifiers have been investigated. An approach for recognizing the legal amount for handwritten Arabic bank check is described in this article. The solution uses multiple information sources to recognize words. The recognition step is preformed with a parallel combination of three kinds of classifiers using holistic word structural features. The classification stage results are first normalized, and the sum combination is performed as a decision fusion scheme, after which a syntactic analyzer makes final decision on the candidate words. Using this approach, the obtained results are very interesting and promising

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
    corecore