24 research outputs found
Some intriguing properties of Tukey's half-space depth
For multivariate data, Tukey's half-space depth is one of the most popular
depth functions available in the literature. It is conceptually simple and
satisfies several desirable properties of depth functions. The Tukey median,
the multivariate median associated with the half-space depth, is also a
well-known measure of center for multivariate data with several interesting
properties. In this article, we derive and investigate some interesting
properties of half-space depth and its associated multivariate median. These
properties, some of which are counterintuitive, have important statistical
consequences in multivariate analysis. We also investigate a natural extension
of Tukey's half-space depth and the related median for probability
distributions on any Banach space (which may be finite- or
infinite-dimensional) and prove some results that demonstrate anomalous
behavior of half-space depth in infinite-dimensional spaces.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ322 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
DEPTH-BASED CLASSIFICATION FOR FUNCTIONAL DATA
Classification is an important task when data are curves. Recently, the notion of statistical depth has been extended to deal with functional observations. In this paper, we propose robust procedures based on the concept of depth to classify curves. These techniques are applied to a real data example. An extensive simulation study with contaminated models illustrates the good robustness properties of these depth-based classification methods.
Nonparametrically consistent depth-based classifiers
We introduce a class of depth-based classification procedures that are of a
nearest-neighbor nature. Depth, after symmetrization, indeed provides the
center-outward ordering that is necessary and sufficient to define nearest
neighbors. Like all their depth-based competitors, the resulting classifiers
are affine-invariant, hence in particular are insensitive to unit changes.
Unlike the former, however, the latter achieve Bayes consistency under
virtually any absolutely continuous distributions - a concept we call
nonparametric consistency, to stress the difference with the stronger universal
consistency of the standard NN classifiers. We investigate the finite-sample
performances of the proposed classifiers through simulations and show that they
outperform affine-invariant nearest-neighbor classifiers obtained through an
obvious standardization construction. We illustrate the practical value of our
classifiers on two real data examples. Finally, we shortly discuss the possible
uses of our depth-based neighbors in other inference problems.Comment: Published at http://dx.doi.org/10.3150/13-BEJ561 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Weighted distance based discriminant analysis: the R package WeDiBaDis
The WeDiBaDis package provides a user friendly environment to perform discriminant analysis (supervised classification). WeDiBaDis is an easy to use package addressed to the biological and medical communities, and in general, to researchers interested in applied studies. It can be suitable when the user is interested in the problem of constructing a discriminant rule on the basis of distances between a relatively small number of instances or units of known unbalanced-class membership measured on many (possibly thousands) features of any type. This is a current situation when analyzing genetic biomedical data. This discriminant rule can then be used both, as a means of explaining differences among classes, but also in the important task of assigning the class membership for new unlabeled units. Our package implements two discriminant analysis procedures in an R environment: the well-known distance-based discriminant analysis (DB-discriminant) and a weighteddistance- based discriminant (WDB-discriminant), a novel classifier rule that we introduce. This new procedure is based on an improvement of the DB rule taking into account the statistical depth of the units. This article presents both classifying procedures and describes the implementation of each in detail. We illustrate the use of the package using an ecological and a genetic experimental example. Finally, we illustrate the effectiveness of the new proposed procedure (WDB), as compared with DB. This comparison is carried out using thirty-eight, high-dimensional, class-unbalanced, cancer data sets, three of which include clinical features
FEMDA: a unified framework for discriminant analysis
Although linear and quadratic discriminant analysis are widely recognized
classical methods, they can encounter significant challenges when dealing with
non-Gaussian distributions or contaminated datasets. This is primarily due to
their reliance on the Gaussian assumption, which lacks robustness. We first
explain and review the classical methods to address this limitation and then
present a novel approach that overcomes these issues. In this new approach, the
model considered is an arbitrary Elliptically Symmetrical (ES) distribution per
cluster with its own arbitrary scale parameter. This flexible model allows for
potentially diverse and independent samples that may not follow identical
distributions. By deriving a new decision rule, we demonstrate that
maximum-likelihood parameter estimation and classification are simple,
efficient, and robust compared to state-of-the-art methods