33,849 research outputs found
Computational Techniques in Multispectral Image Processing : Application to the Syriac Galen Palimpsest
Multispectral and hyperspectral image analysis has experienced much
development in the last decade. The application of these methods to palimpsests
has produced significant results, enabling researchers to recover texts that
would be otherwise lost under the visible overtext, by improving the contrast
between the undertext and the overtext. In this paper we explore an extended
number of multispectral and hyperspectral image analysis methods, consisting of
supervised and unsupervised dimensionality reduction techniques, on a part of
the Syriac Galen Palimpsest dataset (www.digitalgalen.net). Of this extended
set of methods, eight methods gave good results: three were supervised methods
Generalized Discriminant Analysis (GDA), Linear Discriminant Analysis (LDA),
and Neighborhood Component Analysis (NCA); and the other five methods were
unsupervised methods (but still used in a supervised way) Gaussian Process
Latent Variable Model (GPLVM), Isomap, Landmark Isomap, Principal Component
Analysis (PCA), and Probabilistic Principal Component Analysis (PPCA). The
relative success of these methods was determined visually, using color
pictures, on the basis of whether the undertext was distinguishable from the
overtext, resulting in the following ranking of the methods: LDA, NCA, GDA,
Isomap, Landmark Isomap, PPCA, PCA, and GPLVM. These results were compared with
those obtained using the Canonical Variates Analysis (CVA) method on the same
dataset, which showed remarkably accuracy (LDA is a particular case of CVA
where the objects are classified to two classes).Comment: 29 February - 2 March 2016, Second International Conference on
Natural Sciences and Technology in Manuscript Analysis, Centre for the study
of Manuscript Cultures, Hamburg, German
Dimensionality reduction of clustered data sets
We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets
Initializing Probabilistic Linear Discriminant Analysis
Component Analysis (CA) consists of a set of statistical techniques that decompose data to appropriate latent components that are relevant to the task-at-hand (e.g., clustering, segmentation, classification, alignment). During the past few years, an explosion of research in probabilistic CA has been witnessed, with the introduction of several novel methods (e.g., Probabilistic Principal Component Analysis, Probabilistic Linear Discriminant Analysis (PLDA), Probabilistic Canonical Correlation Analysis). PLDA constitutes one of the most widely used supervised CA techniques which is utilized in order to extract suitable, distinct subspaces by exploiting the knowledge of data annotated in terms of different labels. Nevertheless, an inherent difficulty in PLDA variants is the proper initialization of the parameters in order to avoid ending up in poor local maxima. In this light, we propose a novel method to initialize the parameters in PLDA in a consistent and robust way. The performance of the algorithm is demonstrated via a set of experiments on the modified XM2VTS database, which is provided by the authors of the original PLDA model
Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model
In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
In this work we address the problem of approximating high-dimensional data
with a low-dimensional representation. We make the following contributions. We
propose an inverse regression method which exchanges the roles of input and
response, such that the low-dimensional variable becomes the regressor, and
which is tractable. We introduce a mixture of locally-linear probabilistic
mapping model that starts with estimating the parameters of inverse regression,
and follows with inferring closed-form solutions for the forward parameters of
the high-dimensional regression problem of interest. Moreover, we introduce a
partially-latent paradigm, such that the vector-valued response variable is
composed of both observed and latent entries, thus being able to deal with data
contaminated by experimental artifacts that cannot be explained with noise
models. The proposed probabilistic formulation could be viewed as a
latent-variable augmentation of regression. We devise expectation-maximization
(EM) procedures based on a data augmentation strategy which facilitates the
maximum-likelihood search over the model parameters. We propose two
augmentation schemes and we describe in detail the associated EM inference
procedures that may well be viewed as generalizations of a number of EM
regression, dimension reduction, and factor analysis algorithms. The proposed
framework is validated with both synthetic and real data. We provide
experimental evidence that our method outperforms several existing regression
techniques
- …