Search CORE

2,696 research outputs found

Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

Author: Fan Jianqing
Feng Yang
Jiang Jiancheng
Tong Xin
Publication venue
Publication date: 02/01/2015
Field of study

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

CiteSeerX

Crossref

PubMed Central

Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

Author: Bellincontro Andrea
Fordellone Mario
Mencarelli Fabio
Publication venue
Publication date: 01/01/2018
Field of study

The recent development of more sophisticated spectroscopic methods allows acqui- sition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

arXiv.org e-Print Archive

Unitus DSpace

Archivio della ricerca- Università di Roma La Sapienza

Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

Author: Andrea Bellincontro
Fabio Mencarelli
Fordellone Mario
Publication venue: Associazione per la statistica applicata
Publication date: 01/01/2018
Field of study

The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

Archivio della ricerca- Università di Roma La Sapienza

A Direct Estimation Approach to Sparse Linear Discriminant Analysis

Author: Cai Tony
Liu Weidong
Publication venue
Publication date: 01/01/2011
Field of study

This paper considers sparse linear discriminant analysis of high-dimensional data. In contrast to the existing methods which are based on separate estimation of the precision matrix \O and the difference \de of the mean vectors, we introduce a simple and effective classifier by estimating the product \O\de directly through constrained

\ell_1

minimization. The estimator can be implemented efficiently using linear programming and the resulting classifier is called the linear programming discriminant (LPD) rule. The LPD rule is shown to have desirable theoretical and numerical properties. It exploits the approximate sparsity of \O\de and as a consequence allows cases where it can still perform well even when \O and/or \de cannot be estimated consistently. Asymptotic properties of the LPD rule are investigated and consistency and rate of convergence results are given. The LPD classifier has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of \O and \de. The LPD rule is also applied to analyze real datasets from lung cancer and leukemia studies. The classifier performs favorably in comparison to existing methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Effective Discriminative Feature Selection with Non-trivial Solutions

Author: Hou Chenping
Jiao Yuanyuan
Nie Feiping
Tao Hong
Yi Dongyun
Publication venue
Publication date: 21/04/2015
Field of study

Feature selection and feature transformation, the two main ways to reduce dimensionality, are often presented separately. In this paper, a feature selection method is proposed by combining the popular transformation based dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity regularization. We impose row sparsity on the transformation matrix of LDA through

{\ell}_{2,1}

-norm regularization to achieve feature selection, and the resultant formulation optimizes for selecting the most discriminative features and removing the redundant ones simultaneously. The formulation is extended to the

{\ell}_{2,p}

-norm regularized case: which is more likely to offer better sparsity when

0<p<1

. Thus the formulation is a better approximation to the feature selection problem. An efficient algorithm is developed to solve the

{\ell}_{2,p}

-norm based optimization problem and it is proved that the algorithm converges when

0<p\le 2

. Systematical experiments are conducted to understand the work of the proposed method. Promising experimental results on various types of real-world data sets demonstrate the effectiveness of our algorithm

arXiv.org e-Print Archive

CiteSeerX