2,080 research outputs found
Parsimonious Mahalanobis Kernel for the Classification of High Dimensional Data
The classification of high dimensional data with kernel methods is considered
in this article. Exploit- ing the emptiness property of high dimensional
spaces, a kernel based on the Mahalanobis distance is proposed. The computation
of the Mahalanobis distance requires the inversion of a covariance matrix. In
high dimensional spaces, the estimated covariance matrix is ill-conditioned and
its inversion is unstable or impossible. Using a parsimonious statistical
model, namely the High Dimensional Discriminant Analysis model, the specific
signal and noise subspaces are estimated for each considered class making the
inverse of the class specific covariance matrix explicit and stable, leading to
the definition of a parsimonious Mahalanobis kernel. A SVM based framework is
used for selecting the hyperparameters of the parsimonious Mahalanobis kernel
by optimizing the so-called radius-margin bound. Experimental results on three
high dimensional data sets show that the proposed kernel is suitable for
classifying high dimensional data, providing better classification accuracies
than the conventional Gaussian kernel
On Weighted Multivariate Sign Functions
Multivariate sign functions are often used for robust estimation and
inference. We propose using data dependent weights in association with such
functions. The proposed weighted sign functions retain desirable robustness
properties, while significantly improving efficiency in estimation and
inference compared to unweighted multivariate sign-based methods. Using
weighted signs, we demonstrate methods of robust location estimation and robust
principal component analysis. We extend the scope of using robust multivariate
methods to include robust sufficient dimension reduction and functional outlier
detection. Several numerical studies and real data applications demonstrate the
efficacy of the proposed methodology.Comment: Keywords: Multivariate sign, Principal component analysis, Data
depth, Sufficient dimension reductio
Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree
In biological experiments researchers often have information in the form of a
graph that supplements observed numerical data. Incorporating the knowledge
contained in these graphs into an analysis of the numerical data is an
important and nontrivial task. We look at the example of metagenomic
data---data from a genomic survey of the abundance of different species of
bacteria in a sample. Here, the graph of interest is a phylogenetic tree
depicting the interspecies relationships among the bacteria species. We
illustrate that analysis of the data in a nonstandard inner-product space
effectively uses this additional graphical information and produces more
meaningful results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS402 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An Object-Oriented Framework for Robust Multivariate Analysis
Taking advantage of the S4 class system of the programming environment R, which facilitates the creation and maintenance of reusable and modular components, an object-oriented framework for robust multivariate analysis was developed. The framework resides in the packages robustbase and rrcov and includes an almost complete set of algorithms for computing robust multivariate location and scatter, various robust methods for principal component analysis as well as robust linear and quadratic discriminant analysis. The design of these methods follows common patterns which we call statistical design patterns in analogy to the design patterns widely used in software engineering. The application of the framework to data analysis as well as possible extensions by the development of new methods is demonstrated on examples which themselves are part of the package rrcov.
- …