Search CORE

1,136 research outputs found

High-dimensional classification using features annealed independence rules

Author: Fan Jianqing
Fan Yingying
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989--1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as poor as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as poorly as the random guessing. Thus, it is important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample

t

-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.Comment: Published in at http://dx.doi.org/10.1214/07-AOS504 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

A Direct Estimation Approach to Sparse Linear Discriminant Analysis

Author: Cai Tony
Liu Weidong
Publication venue
Publication date: 01/01/2011
Field of study

This paper considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix \O and the difference \de of the mean vectors, we introduce a simple and effective classifier by estimating the product \O\de directly through constrained

\ell_1

minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of \O\de and as a consequence allows cases where it can still perform well even when \O and/or \de cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of \O and \de. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data

Author: Anderson
Baiqi Miao
Barry
Bickel
Cai
Candes
Cheng Wang
Donoho
Dudoit
Dudoit
Fan
Fan
Fan
Goeman
Golub
Hess
Lai
Li
Longbing Cao
Mai
Shao
Srivastava
Tibshirani
Tong
Wu
Yeung
Zuber
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

This work studies the theoretical rules of feature selection in linear discriminant analysis (LDA), and a new feature selection method is proposed for sparse linear discriminant analysis. An

l_1

minimization method is used to select the important features from which the LDA will be constructed. The asymptotic results of this proposed two-stage LDA (TLDA) are studied, demonstrating that TLDA is an optimal classification rule whose convergence rate is the best compared to existing methods. The experiments on simulated and real datasets are consistent with the theoretical results and show that TLDA performs favorably in comparison with current methods. Overall, TLDA uses a lower minimum number of features or genes than other approaches to achieve a better result with a reduced misclassification rate.Comment: 20 pages, 3 figures, 5 tables, accepted by Computational Statistics and Data Analysi

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Fast rate of convergence in high dimensional linear discriminant analysis

Author: Girard Robin
Publication venue
Publication date: 19/02/2010
Field of study

This paper gives a theoretical analysis of high dimensional linear discrimination of Gaussian data. We study the excess risk of linear discriminant rules. We emphasis on the poor performances of standard procedures in the case when dimension p is larger than sample size n. The corresponding theoretical results are non asymptotic lower bounds. On the other hand, we propose two discrimination procedures based on dimensionality reduction and provide associated rates of convergence which can be O(log(p)/n) under sparsity assumptions. Finally all our results rely on a theorem that provides simple sharp relations between the excess risk and an estimation error associated to the geometric parameters defining the used discrimination rule

arXiv.org e-Print Archive

HAL Descartes

HAL-MINES ParisTech

Sparsifying the Fisher Linear Discriminant by Rotation

Author: Dong Bin
Fan Jianqing
Hao Ning
Publication venue
Publication date: 16/08/2014
Field of study

Many high dimensional classification techniques have been proposed in the literature based on sparse linear discriminant analysis (LDA). To efficiently use them, sparsity of linear classifiers is a prerequisite. However, this might not be readily available in many applications, and rotations of data are required to create the needed sparsity. In this paper, we propose a family of rotations to create the required sparsity. The basic idea is to use the principal components of the sample covariance matrix of the pooled samples and its variants to rotate the data first and to then apply an existing high dimensional classifier. This rotate-and-solve procedure can be combined with any existing classifiers, and is robust against the sparsity level of the true model. We show that these rotations do create the sparsity needed for high dimensional classifications and provide theoretical understanding why such a rotation works empirically. The effectiveness of the proposed method is demonstrated by a number of simulated and real data examples, and the improvements of our method over some popular high dimensional classification rules are clearly shown.Comment: 30 pages and 9 figures. This paper has been accepted by Journal of the Royal Statistical Society: Series B (Statistical Methodology). The first two versions of this paper were uploaded to Bin Dong's web site under the title "A Rotate-and-Solve Procedure for Classification" in 2013 May and 2014 January. This version may be slightly different from the published versio

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

On Two Simple and Effective Procedures for High Dimensional Classification of General Populations

Author: Li Zhaoyuan
Yao Jianfeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this paper, we generalize two criteria, the determinant-based and trace-based criteria proposed by Saranadasa (1993), to general populations for high dimensional classification. These two criteria compare some distances between a new observation and several different known groups. The determinant-based criterion performs well for correlated variables by integrating the covariance structure and is competitive to many other existing rules. The criterion however requires the measurement dimension be smaller than the sample size. The trace-based criterion in contrast, is an independence rule and effective in the "large dimension-small sample size" scenario. An appealing property of these two criteria is that their implementation is straightforward and there is no need for preliminary variable selection or use of turning parameters. Their asymptotic misclassification probabilities are derived using the theory of large dimensional random matrices. Their competitive performances are illustrated by intensive Monte Carlo experiments and a real data analysis.Comment: 5 figures; 22 pages. To appear in "Statistical Papers

arXiv.org e-Print Archive

HKU Scholars Hub

Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

Author: Fan Jianqing
Feng Yang
Jiang Jiancheng
Tong Xin
Publication venue
Publication date: 02/01/2015
Field of study

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository