83 research outputs found

    Feature design and lexicon reduction for efficient offline handwriting recognition

    Get PDF
    This thesis establishes a pattern recognition framework for offline word recognition systems. It focuses on the image level features because they greatly influence the recognition performance. In particular, we consider two complementary aspects of prominent features impact: lexicon reduction and the actual recognition. The first aspect, lexicon reduction, consists in the design of a weak classifier which outputs a set of candidate word hypotheses given a word image. Its main purpose is to reduce the recognition computational time while maintaining (or even improving) the recognition rate. The second aspect is the actual recognition system itself. In fact, several features exist in the literature based on different fields of research, but no consensus exists concerning the most promising ones. The goal of the proposed framework is to improve our understanding of relevant features in order to build better recognition systems. For this purpose, we addressed two specific problems: 1) feature design for lexicon reduction (application to Arabic script), and 2) feature evaluation for cursive handwriting recognition (application to Latin and Arabic scripts). Few methods exist for lexicon reduction in Arabic script, unlike Latin script. Existing methods use salient features of Arabic words such as the number of subwords and diacritics, but totally ignore the shape of the subwords. Therefore, our first goal is to perform lexicon reductionn based on subwords shape. Our approach is based on shape indexing, where the shape of a query subword is compared to a labeled database of sample subwords. For efficient comparison with a low computational overhead, we proposed the weighted topological signature vector (W-TSV) framework, where the subword shape is modeled as a weighted directed acyclic graph (DAG) from which the W-TSV vector is extracted for efficient indexing. The main contributions of this work are to extend the existing TSV framework to weighted DAG and to propose a shape indexing approach for lexicon reduction. Good performance for lexicon reduction is achieved for Arabic subwords. Nevertheless, the performance remains modest for Arabic words. Considering the results of our first work on Arabic lexicon reduction, we propose to build a new index for better performance at the word level. The subword shape and the number of subwords and diacritics are all important components of Arabic word shape. We therefore propose the Arabic word descriptor (AWD) which integrates all the aforementioned components. It is built in two steps. First, a structural descriptor (SD) is computed for each connected component (CC) of the word image. It describes the CC shape using the bag-of-words model, where each visual word represents a different local shape structure. Then, the AWD is formed by concatenating the SDs using an efficient heuristic, implicitly discriminating between subwords and diacritics. In the context of lexicon reduction, the AWD is used to index a reference database. The main contribution of this work is the design of the AWD, which integrates lowlevel cues (subword shape structure) and symbolic information (subword counts and diacritics) into a single descriptor. The proposed method has a low computational overhead, it is simple to implement and it provides state-of-the-art performance for lexicon reduction on two Arabic databases, namely the Ibn Sina database of subwords and the IFN/ENIT database of words. The last part of this thesis focuses on features for word recognition. A large body of features exist in the literature, each of them being motivated by different fields, such as pattern recognition, computer vision or machine learning. Identifying the most promising approaches would improve the design of the next generation of features. Nevertheless, because they are based on different concepts, it is difficult to compare them on a theoretical ground and efficient empirical tools are needed. Therefore, the last objective of the thesis is to provide a method for feature evaluation that assesses the strength and complementarity of existing features. A combination scheme has been designed for this purpose, in which each feature is evaluated through a reference recognition system, based on recurrent neural networks. More precisely, each feature is represented by an agent, which is an instance of the recognition system trained with that feature. The decisions of all the agents are combined using a weighted vote. The weights are jointly optimized during a training phase in order to increase the weighted vote of the true word label. Therefore, they reflect the strength and complementarity of the agents and their features for the given task. Finally, they are converted into a numerical score assigned to each feature, which is easy to interpret under this combination model. To the best of our knowledge, this is the first feature evaluation method able to quantify the importance of each feature, instead of providing a ranking based on the recognition rate. Five state-of-the-art features have been tested, and our results provide interesting insight for future feature design

    Subword Recognition in Historical Arabic Documents using C-GRUs

    Get PDF
    The recent years have witnessed an increased tendency to digitize historical manuscripts that not only ensures the preservation of these collections but also allows researchers and end-users’ direct access to these images. Recognition of Arabic handwriting is challenging due to the highly cursive nature of the script and other challenges associated with historical documents (degradation etc.). This paper presents an end-to-end system to recognize Arabic handwritten sub words in historical documents. More specifically, we introduce a hybrid CNN-GRU model where the shallow convolutional network learns robust feature representations while the GRU layers carry out the sequence modelling and generate the transcription of the text. The proposed system is evaluated on two different datasets, IBN SINA and VML-HD reporting recognition rates of 96.10% and 98.60% respectively. A comparison with existing techniques evaluated on the same datasets validates the effectiveness of our proposed model in characterizing Arabic subwords

    Ensemble learning using multi-objective optimisation for arabic handwritten words

    Get PDF
    Arabic handwriting recognition is a dynamic and stimulating field of study within pattern recognition. This system plays quite a significant part in today's global environment. It is a widespread and computationally costly function due to cursive writing, a massive number of words, and writing style. Based on the literature, the existing features lack data supportive techniques and building geometric features. Most ensemble learning approaches are based on the assumption of linear combination, which is not valid due to differences in data types. Also, the existing approaches of classifier generation do not support decision-making for selecting the most suitable classifier, and it requires enabling multi-objective optimisation to handle these differences in data types. In this thesis, new type of feature for handwriting using Segments Interpolation (SI) to find the best fitting line in each of the windows with a model for finding the best operating point window size for SI features. Multi-Objective Ensemble Oriented (MOEO) formulated to control the classifier topology and provide feedback support for changing the classifiers' topology and weights based on the extension of Non-dominated Sorting Genetic Algorithm (NSGA-II). It is designated as the Random Subset based Parents Selection (RSPS-NSGA-II) to handle neurons and accuracy. Evaluation metrics from two perspectives classification and Multiobjective optimization. The experimental design based on two subsets of the IFN/ENIT database. The first one consists of 10 classes (C10) and 22 classes (C22). The features were tested with Support Vector Machine (SVM) and Extreme Learning Machine (ELM). This work improved due to the SI feature. SI shows a significant result with SVM with 88.53% for C22. RSPS for C10 at k=2 achieved 91% accuracy with fewer neurons than NSGA-II, and for C22 at k=10, accuracy has been increased 81% compared to NSGA-II 78%. Future work may consider introducing more features to the system, applying them to other languages, and integrating it with sequence learning for more accuracy

    Arabic Manuscript Layout Analysis and Classification

    Get PDF

    Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding

    Full text link
    Retrieval of text information from natural scene images and video frames is a challenging task due to its inherent problems like complex character shapes, low resolution, background noise, etc. Available OCR systems often fail to retrieve such information in scene/video frames. Keyword spotting, an alternative way to retrieve information, performs efficient text searching in such scenarios. However, current word spotting techniques in scene/video images are script-specific and they are mainly developed for Latin script. This paper presents a novel word spotting framework using dynamic shape coding for text retrieval in natural scene image and video frames. The framework is designed to search query keyword from multiple scripts with the help of on-the-fly script-wise keyword generation for the corresponding script. We have used a two-stage word spotting approach using Hidden Markov Model (HMM) to detect the translated keyword in a given text line by identifying the script of the line. A novel unsupervised dynamic shape coding based scheme has been used to group similar shape characters to avoid confusion and to improve text alignment. Next, the hypotheses locations are verified to improve retrieval performance. To evaluate the proposed system for searching keyword from natural scene image and video frames, we have considered two popular Indic scripts such as Bangla (Bengali) and Devanagari along with English. Inspired by the zone-wise recognition approach in Indic scripts[1], zone-wise text information has been used to improve the traditional word spotting performance in Indic scripts. For our experiment, a dataset consisting of images of different scenes and video frames of English, Bangla and Devanagari scripts were considered. The results obtained showed the effectiveness of our proposed word spotting approach.Comment: Multimedia Tools and Applications, Springe
    corecore