883 research outputs found

    Mixtures of Spatial Spline Regressions

    Full text link
    We present an extension of the functional data analysis framework for univariate functions to the analysis of surfaces: functions of two variables. The spatial spline regression (SSR) approach developed can be used to model surfaces that are sampled over a rectangular domain. Furthermore, combining SSR with linear mixed effects models (LMM) allows for the analysis of populations of surfaces, and combining the joint SSR-LMM method with finite mixture models allows for the analysis of populations of surfaces with sub-family structures. Through the mixtures of spatial splines regressions (MSSR) approach developed, we present methodologies for clustering surfaces into sub-families, and for performing surface-based discriminant analysis. The effectiveness of our methodologies, as well as the modeling capabilities of the SSR model are assessed through an application to handwritten character recognition

    Bernoulli HMMs for Handwritten Text Recognition

    Full text link
    In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using discriminative training criteria, instead of the conventionalMaximum Likelihood Estimation (MLE). Specifically, we propose a log-linear classifier for binary data based on the BHMM classifier. Parameter estimation of this model can be carried out using discriminative training criteria for log-linear models. In particular, we show the formulae for several MMI based criteria. Finally, we prove the equivalence between both classifiers, hence, discriminative training of a BHMM classifier can be carried out by obtaining its equivalent log-linear classifier. Reported results show that discriminative BHMMs clearly outperform conventional generative BHMMs.Giménez Pastor, A. (2014). Bernoulli HMMs for Handwritten Text Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37978TESI

    Graph-based classification of multiple observation sets

    Get PDF
    We consider the problem of classification of an object given multiple observations that possibly include different transformations. The possible transformations of the object generally span a low-dimensional manifold in the original signal space. We propose to take advantage of this manifold structure for the effective classification of the object represented by the observation set. In particular, we design a low complexity solution that is able to exploit the properties of the data manifolds with a graph-based algorithm. Hence, we formulate the computation of the unknown label matrix as a smoothing process on the manifold under the constraint that all observations represent an object of one single class. It results into a discrete optimization problem, which can be solved by an efficient and low complexity algorithm. We demonstrate the performance of the proposed graph-based algorithm in the classification of sets of multiple images. Moreover, we show its high potential in video-based face recognition, where it outperforms state-of-the-art solutions that fall short of exploiting the manifold structure of the face image data sets.Comment: New content adde

    Using generative models for handwritten digit recognition

    Get PDF
    We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian ``ink generators'' spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. (2) During the process of explaining the image, generative models can perform recognition driven segmentation. (3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. (4) Unlike many other recognition schemes it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is it requires much more computation than more standard OCR techniques

    Improving Deep Representation Learning with Complex and Multimodal Data.

    Full text link
    Representation learning has emerged as a way to learn meaningful representation from data and made a breakthrough in many applications including visual object recognition, speech recognition, and text understanding. However, learning representation from complex high-dimensional sensory data is challenging since there exist many irrelevant factors of variation (e.g., data transformation, random noise). On the other hand, to build an end-to-end prediction system for structured output variables, one needs to incorporate probabilistic inference to properly model a mapping from single input to possible configurations of output variables. This thesis addresses limitations of current representation learning in two parts. The first part discusses efficient learning algorithms of invariant representation based on restricted Boltzmann machines (RBMs). Pointing out the difficulty of learning, we develop an efficient initialization method for sparse and convolutional RBMs. On top of that, we develop variants of RBM that learn representations invariant to data transformations such as translation, rotation, or scale variation by pooling the filter responses of input data after a transformation, or to irrelevant patterns such as random or structured noise, by jointly performing feature selection and feature learning. We demonstrate improved performance on visual object recognition and weakly supervised foreground object segmentation. The second part discusses conditional graphical models and learning frameworks for structured output variables using deep generative models as prior. For example, we combine the best properties of the CRF and the RBM to enforce both local and global (e.g., object shape) consistencies for visual object segmentation. Furthermore, we develop a deep conditional generative model of structured output variables, which is an end-to-end system trainable by backpropagation. We demonstrate the importance of global prior and probabilistic inference for visual object segmentation. Second, we develop a novel multimodal learning framework by casting the problem into structured output representation learning problems, where the output is one data modality to be predicted from the other modalities, and vice versa. We explain as to how our method could be more effective than maximum likelihood learning and demonstrate the state-of-the-art performance on visual-text and visual-only recognition tasks.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113549/1/kihyuks_1.pd

    The Prototyping and Focused Discriminating Strategy for Pattern Recognition and one Instantiation: the MELIDIS System

    Get PDF
    This paper presents the Prototyping and Focused Discriminating (PFD) strategy for pattern recognition. This strategy takes benefits from the duality between model generation and discrimination. Both collaborate through a focusing mechanism that detects the conflicts between the class models and drive the discrimination. Classifiers based on this collaboration benefit from a set of useful properties. The MĂ©lidis system illustrates this strategy and extends its possibilities, using a fuzzy framework. As shown by experiments, the resulting system provides an interesting compromise between accuracy and compactness. Experiments also demonstrate the interest of the new strategy and of its focusing mechanism

    High-Quality Wavelets Features Extraction for Handwritten Arabic Numerals Recognition

    Get PDF
    Arabic handwritten digit recognition is the science of recognition and classification of handwritten Arabic digits. It has been a subject of research for many years with rich literature available on the subject.  Handwritten digits written by different people are not of the same size, thickness, style, position or orientation. Hence, many different challenges have to overcome for resolving the problem of handwritten digit recognition.  The variation in the digits is due to the writing styles of different people which can differ significantly.  Automatic handwritten digit recognition has wide application such as automatic processing of bank cheques, postal addresses, and tax forms. A typical handwritten digit recognition application consists of three main stages namely features extraction, features selection, and classification. One of the most important problems is feature extraction. In this paper, a novel feature extraction approach for off-line handwritten digit recognition is presented. Wavelets-based analysis of image data is carried out for feature extraction, and then classification is performed using various classifiers. To further reduce the size of training data-set, high entropy subbands are selected. To increase the recognition rate, individual subbands providing high classification accuracies are selected from the over-complete tree. The features extracted are also normalized to standardize the range of independent variables before providing them to the classifier. Classification is carried out using k-NN and SVMs. The results show that the quality of extracted features is high as almost equivalently high classification accuracies are acquired for both classifiers, i.e. k-NNs and SVMs
    • …
    corecore