23,003 research outputs found
A Direct Estimation Approach to Sparse Linear Discriminant Analysis
This paper considers sparse linear discriminant analysis of high-dimensional
data. In contrast to the existing methods which are based on separate
estimation of the precision matrix \O and the difference \de of the mean
vectors, we introduce a simple and effective classifier by estimating the
product \O\de directly through constrained minimization. The
estimator can be implemented efficiently using linear programming and the
resulting classifier is called the linear programming discriminant (LPD) rule.
The LPD rule is shown to have desirable theoretical and numerical properties.
It exploits the approximate sparsity of \O\de and as a consequence allows
cases where it can still perform well even when \O and/or \de cannot be
estimated consistently. Asymptotic properties of the LPD rule are investigated
and consistency and rate of convergence results are given. The LPD classifier
has superior finite sample performance and significant computational advantages
over the existing methods that require separate estimation of \O and \de.
The LPD rule is also applied to analyze real datasets from lung cancer and
leukemia studies. The classifier performs favorably in comparison to existing
methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio
Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data
This work studies the theoretical rules of feature selection in linear
discriminant analysis (LDA), and a new feature selection method is proposed for
sparse linear discriminant analysis. An minimization method is used to
select the important features from which the LDA will be constructed. The
asymptotic results of this proposed two-stage LDA (TLDA) are studied,
demonstrating that TLDA is an optimal classification rule whose convergence
rate is the best compared to existing methods. The experiments on simulated and
real datasets are consistent with the theoretical results and show that TLDA
performs favorably in comparison with current methods. Overall, TLDA uses a
lower minimum number of features or genes than other approaches to achieve a
better result with a reduced misclassification rate.Comment: 20 pages, 3 figures, 5 tables, accepted by Computational Statistics
and Data Analysi
Supervised Classification Using Sparse Fisher's LDA
It is well known that in a supervised classification setting when the number
of features is smaller than the number of observations, Fisher's linear
discriminant rule is asymptotically Bayes. However, there are numerous modern
applications where classification is needed in the high-dimensional setting.
Naive implementation of Fisher's rule in this case fails to provide good
results because the sample covariance matrix is singular. Moreover, by
constructing a classifier that relies on all features the interpretation of the
results is challenging. Our goal is to provide robust classification that
relies only on a small subset of important features and accounts for the
underlying correlation structure. We apply a lasso-type penalty to the
discriminant vector to ensure sparsity of the solution and use a shrinkage type
estimator for the covariance matrix. The resulting optimization problem is
solved using an iterative coordinate ascent algorithm. Furthermore, we analyze
the effect of nonconvexity on the sparsity level of the solution and highlight
the difference between the penalized and the constrained versions of the
problem. The simulation results show that the proposed method performs
favorably in comparison to alternatives. The method is used to classify
leukemia patients based on DNA methylation features
Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification
We propose a high dimensional classification method that involves
nonparametric feature augmentation. Knowing that marginal density ratios are
the most powerful univariate classifiers, we use the ratio estimates to
transform the original feature measurements. Subsequently, penalized logistic
regression is invoked, taking as input the newly transformed or augmented
features. This procedure trains models equipped with local complexity and
global simplicity, thereby avoiding the curse of dimensionality while creating
a flexible nonlinear decision boundary. The resulting method is called Feature
Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by
generalizing the Naive Bayes model, writing the log ratio of joint densities as
a linear combination of those of marginal densities. It is related to
generalized additive models, but has better interpretability and computability.
Risk bounds are developed for FANS. In numerical analysis, FANS is compared
with competing methods, so as to provide a guideline on its best application
domain. Real data analysis demonstrates that FANS performs very competitively
on benchmark email spam and gene expression data sets. Moreover, FANS is
implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
High-dimensional classification using features annealed independence rules
Classification using high-dimensional features arises frequently in many
contemporary statistical studies such as tumor classification using microarray
or other high-throughput data. The impact of dimensionality on classifications
is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10
(2004) 989--1010] show that the Fisher discriminant performs poorly due to
diverging spectra and they propose to use the independence rule to overcome the
problem. We first demonstrate that even for the independence classification
rule, classification using all the features can be as poor as the random
guessing due to noise accumulation in estimating population centroids in
high-dimensional feature space. In fact, we demonstrate further that almost all
linear discriminants can perform as poorly as the random guessing. Thus, it is
important to select a subset of important features for high-dimensional
classification, resulting in Features Annealed Independence Rules (FAIR). The
conditions under which all the important features can be selected by the
two-sample -statistic are established. The choice of the optimal number of
features, or equivalently, the threshold value of the test statistics are
proposed based on an upper bound of the classification error. Simulation
studies and real data analysis support our theoretical results and demonstrate
convincingly the advantage of our new classification procedure.Comment: Published in at http://dx.doi.org/10.1214/07-AOS504 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sparsifying the Fisher Linear Discriminant by Rotation
Many high dimensional classification techniques have been proposed in the
literature based on sparse linear discriminant analysis (LDA). To efficiently
use them, sparsity of linear classifiers is a prerequisite. However, this might
not be readily available in many applications, and rotations of data are
required to create the needed sparsity. In this paper, we propose a family of
rotations to create the required sparsity. The basic idea is to use the
principal components of the sample covariance matrix of the pooled samples and
its variants to rotate the data first and to then apply an existing high
dimensional classifier. This rotate-and-solve procedure can be combined with
any existing classifiers, and is robust against the sparsity level of the true
model. We show that these rotations do create the sparsity needed for high
dimensional classifications and provide theoretical understanding why such a
rotation works empirically. The effectiveness of the proposed method is
demonstrated by a number of simulated and real data examples, and the
improvements of our method over some popular high dimensional classification
rules are clearly shown.Comment: 30 pages and 9 figures. This paper has been accepted by Journal of
the Royal Statistical Society: Series B (Statistical Methodology). The first
two versions of this paper were uploaded to Bin Dong's web site under the
title "A Rotate-and-Solve Procedure for Classification" in 2013 May and 2014
January. This version may be slightly different from the published versio
- …