1,136 research outputs found

    High-dimensional classification using features annealed independence rules

    Full text link
    Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989--1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as poor as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as poorly as the random guessing. Thus, it is important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample tt-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.Comment: Published in at http://dx.doi.org/10.1214/07-AOS504 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Direct Estimation Approach to Sparse Linear Discriminant Analysis

    Get PDF
    This paper considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix \O and the difference \de of the mean vectors, we introduce a simple and effective classifier by estimating the product \O\de directly through constrained â„“1\ell_1 minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of \O\de and as a consequence allows cases where it can still perform well even when \O and/or \de cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of \O and \de. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio

    Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data

    Full text link
    This work studies the theoretical rules of feature selection in linear discriminant analysis (LDA), and a new feature selection method is proposed for sparse linear discriminant analysis. An l1l_1 minimization method is used to select the important features from which the LDA will be constructed. The asymptotic results of this proposed two-stage LDA (TLDA) are studied, demonstrating that TLDA is an optimal classification rule whose convergence rate is the best compared to existing methods. The experiments on simulated and real datasets are consistent with the theoretical results and show that TLDA performs favorably in comparison with current methods. Overall, TLDA uses a lower minimum number of features or genes than other approaches to achieve a better result with a reduced misclassification rate.Comment: 20 pages, 3 figures, 5 tables, accepted by Computational Statistics and Data Analysi

    Fast rate of convergence in high dimensional linear discriminant analysis

    Full text link
    This paper gives a theoretical analysis of high dimensional linear discrimination of Gaussian data. We study the excess risk of linear discriminant rules. We emphasis on the poor performances of standard procedures in the case when dimension p is larger than sample size n. The corresponding theoretical results are non asymptotic lower bounds. On the other hand, we propose two discrimination procedures based on dimensionality reduction and provide associated rates of convergence which can be O(log(p)/n) under sparsity assumptions. Finally all our results rely on a theorem that provides simple sharp relations between the excess risk and an estimation error associated to the geometric parameters defining the used discrimination rule

    Sparsifying the Fisher Linear Discriminant by Rotation

    Full text link
    Many high dimensional classification techniques have been proposed in the literature based on sparse linear discriminant analysis (LDA). To efficiently use them, sparsity of linear classifiers is a prerequisite. However, this might not be readily available in many applications, and rotations of data are required to create the needed sparsity. In this paper, we propose a family of rotations to create the required sparsity. The basic idea is to use the principal components of the sample covariance matrix of the pooled samples and its variants to rotate the data first and to then apply an existing high dimensional classifier. This rotate-and-solve procedure can be combined with any existing classifiers, and is robust against the sparsity level of the true model. We show that these rotations do create the sparsity needed for high dimensional classifications and provide theoretical understanding why such a rotation works empirically. The effectiveness of the proposed method is demonstrated by a number of simulated and real data examples, and the improvements of our method over some popular high dimensional classification rules are clearly shown.Comment: 30 pages and 9 figures. This paper has been accepted by Journal of the Royal Statistical Society: Series B (Statistical Methodology). The first two versions of this paper were uploaded to Bin Dong's web site under the title "A Rotate-and-Solve Procedure for Classification" in 2013 May and 2014 January. This version may be slightly different from the published versio

    On Two Simple and Effective Procedures for High Dimensional Classification of General Populations

    Get PDF
    In this paper, we generalize two criteria, the determinant-based and trace-based criteria proposed by Saranadasa (1993), to general populations for high dimensional classification. These two criteria compare some distances between a new observation and several different known groups. The determinant-based criterion performs well for correlated variables by integrating the covariance structure and is competitive to many other existing rules. The criterion however requires the measurement dimension be smaller than the sample size. The trace-based criterion in contrast, is an independence rule and effective in the "large dimension-small sample size" scenario. An appealing property of these two criteria is that their implementation is straightforward and there is no need for preliminary variable selection or use of turning parameters. Their asymptotic misclassification probabilities are derived using the theory of large dimensional random matrices. Their competitive performances are illustrated by intensive Monte Carlo experiments and a real data analysis.Comment: 5 figures; 22 pages. To appear in "Statistical Papers

    Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

    Full text link
    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
    • …
    corecore