33,849 research outputs found

    Computational Techniques in Multispectral Image Processing : Application to the Syriac Galen Palimpsest

    Get PDF
    Multispectral and hyperspectral image analysis has experienced much development in the last decade. The application of these methods to palimpsests has produced significant results, enabling researchers to recover texts that would be otherwise lost under the visible overtext, by improving the contrast between the undertext and the overtext. In this paper we explore an extended number of multispectral and hyperspectral image analysis methods, consisting of supervised and unsupervised dimensionality reduction techniques, on a part of the Syriac Galen Palimpsest dataset (www.digitalgalen.net). Of this extended set of methods, eight methods gave good results: three were supervised methods Generalized Discriminant Analysis (GDA), Linear Discriminant Analysis (LDA), and Neighborhood Component Analysis (NCA); and the other five methods were unsupervised methods (but still used in a supervised way) Gaussian Process Latent Variable Model (GPLVM), Isomap, Landmark Isomap, Principal Component Analysis (PCA), and Probabilistic Principal Component Analysis (PPCA). The relative success of these methods was determined visually, using color pictures, on the basis of whether the undertext was distinguishable from the overtext, resulting in the following ranking of the methods: LDA, NCA, GDA, Isomap, Landmark Isomap, PPCA, PCA, and GPLVM. These results were compared with those obtained using the Canonical Variates Analysis (CVA) method on the same dataset, which showed remarkably accuracy (LDA is a particular case of CVA where the objects are classified to two classes).Comment: 29 February - 2 March 2016, Second International Conference on Natural Sciences and Technology in Manuscript Analysis, Centre for the study of Manuscript Cultures, Hamburg, German

    Dimensionality reduction of clustered data sets

    Get PDF
    We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets

    Initializing Probabilistic Linear Discriminant Analysis

    Get PDF
    Component Analysis (CA) consists of a set of statistical techniques that decompose data to appropriate latent components that are relevant to the task-at-hand (e.g., clustering, segmentation, classification, alignment). During the past few years, an explosion of research in probabilistic CA has been witnessed, with the introduction of several novel methods (e.g., Probabilistic Principal Component Analysis, Probabilistic Linear Discriminant Analysis (PLDA), Probabilistic Canonical Correlation Analysis). PLDA constitutes one of the most widely used supervised CA techniques which is utilized in order to extract suitable, distinct subspaces by exploiting the knowledge of data annotated in terms of different labels. Nevertheless, an inherent difficulty in PLDA variants is the proper initialization of the parameters in order to avoid ending up in poor local maxima. In this light, we propose a novel method to initialize the parameters in PLDA in a consistent and robust way. The performance of the algorithm is demonstrated via a set of experiments on the modified XM2VTS database, which is provided by the authors of the original PLDA model

    Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model

    Get PDF
    In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions

    High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

    Get PDF
    In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques
    • …
    corecore