838 research outputs found
Kernel discriminant analysis and clustering with parsimonious Gaussian process models
This work presents a family of parsimonious Gaussian process models which
allow to build, from a finite sample, a model-based classifier in an infinite
dimensional space. The proposed parsimonious models are obtained by
constraining the eigen-decomposition of the Gaussian processes modeling each
class. This allows in particular to use non-linear mapping functions which
project the observations into infinite dimensional spaces. It is also
demonstrated that the building of the classifier can be directly done from the
observation space through a kernel function. The proposed classification method
is thus able to classify data of various types such as categorical data,
functional data or networks. Furthermore, it is possible to classify mixed data
by combining different kernels. The methodology is as well extended to the
unsupervised classification case. Experimental results on various data sets
demonstrate the effectiveness of the proposed method
Kernel discriminant analysis and clustering with parsimonious Gaussian process models
International audienceThis work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case. Experimental results on various data sets demonstrate the effectiveness of the proposed method
Parsimonious Mahalanobis Kernel for the Classification of High Dimensional Data
The classification of high dimensional data with kernel methods is considered
in this article. Exploit- ing the emptiness property of high dimensional
spaces, a kernel based on the Mahalanobis distance is proposed. The computation
of the Mahalanobis distance requires the inversion of a covariance matrix. In
high dimensional spaces, the estimated covariance matrix is ill-conditioned and
its inversion is unstable or impossible. Using a parsimonious statistical
model, namely the High Dimensional Discriminant Analysis model, the specific
signal and noise subspaces are estimated for each considered class making the
inverse of the class specific covariance matrix explicit and stable, leading to
the definition of a parsimonious Mahalanobis kernel. A SVM based framework is
used for selecting the hyperparameters of the parsimonious Mahalanobis kernel
by optimizing the so-called radius-margin bound. Experimental results on three
high dimensional data sets show that the proposed kernel is suitable for
classifying high dimensional data, providing better classification accuracies
than the conventional Gaussian kernel
A Simple Iterative Algorithm for Parsimonious Binary Kernel Fisher Discrimination
By applying recent results in optimization theory variously known as optimization transfer or majorize/minimize algorithms, an algorithm for binary, kernel, Fisher discriminant analysis is introduced that makes use of a non-smooth penalty on the coefficients to provide a parsimonious solution. The problem is converted into a smooth optimization that can be solved iteratively with no greater overhead than iteratively re-weighted least-squares. The result is simple, easily programmed and is shown to perform, in terms of both accuracy and parsimony, as well as or better than a number of leading machine learning algorithms on two well-studied and substantial benchmarks
Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications
Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification
performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins
A robust approach to model-based classification based on trimming and constraints
In a standard classification framework a set of trustworthy learning data are
employed to build a decision rule, with the final aim of classifying unlabelled
units belonging to the test set. Therefore, unreliable labelled observations,
namely outliers and data with incorrect labels, can strongly undermine the
classifier performance, especially if the training size is small. The present
work introduces a robust modification to the Model-Based Classification
framework, employing impartial trimming and constraints on the ratio between
the maximum and the minimum eigenvalue of the group scatter matrices. The
proposed method effectively handles noise presence in both response and
exploratory variables, providing reliable classification even when dealing with
contaminated datasets. A robust information criterion is proposed for model
selection. Experiments on real and simulated data, artificially adulterated,
are provided to underline the benefits of the proposed method
- …