Search CORE

7,769 research outputs found

Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

Author: Andrea Bellincontro
Fabio Mencarelli
Fordellone Mario
Publication venue: Associazione per la statistica applicata
Publication date: 01/01/2018
Field of study

The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

Archivio della ricerca- Università di Roma La Sapienza

Supervised Classification Using Sparse Fisher's LDA

Author: Booth James G.
Gaynanova Irina
Wells Martin T.
Publication venue
Publication date: 16/09/2014
Field of study

It is well known that in a supervised classification setting when the number of features is smaller than the number of observations, Fisher's linear discriminant rule is asymptotically Bayes. However, there are numerous modern applications where classification is needed in the high-dimensional setting. Naive implementation of Fisher's rule in this case fails to provide good results because the sample covariance matrix is singular. Moreover, by constructing a classifier that relies on all features the interpretation of the results is challenging. Our goal is to provide robust classification that relies only on a small subset of important features and accounts for the underlying correlation structure. We apply a lasso-type penalty to the discriminant vector to ensure sparsity of the solution and use a shrinkage type estimator for the covariance matrix. The resulting optimization problem is solved using an iterative coordinate ascent algorithm. Furthermore, we analyze the effect of nonconvexity on the sparsity level of the solution and highlight the difference between the penalized and the constrained versions of the problem. The simulation results show that the proposed method performs favorably in comparison to alternatives. The method is used to classify leukemia patients based on DNA methylation features

arXiv.org e-Print Archive

CiteSeerX

Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

Author: Bellincontro Andrea
Fordellone Mario
Mencarelli Fabio
Publication venue
Publication date: 01/01/2018
Field of study

The recent development of more sophisticated spectroscopic methods allows acqui- sition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

arXiv.org e-Print Archive

Unitus DSpace

Archivio della ricerca- Università di Roma La Sapienza

Discriminant analysis under the common principal components model

Author: Nel D.G.
Pepler P.T.
Uys D.W.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015) proposed a regularised CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations where the CPC model is applicable. This paper extends their work to the context of discriminant analysis for two groups, by plugging the regularised CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures

Enlighten

Protein sectors: statistical coupling analysis versus conservation

Author: Colwell Lucy J.
Leibler Stanislas
Tesileanu Tiberiu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/12/2014
Field of study

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.Comment: 36 pages, 17 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

City University of New York

Directory of Open Access Journals

PubMed Central

FigShare

Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem

Author: Anant Agrawal (3953690)
Andrea Lozzi (3953780)
Cristin G. Welle (3953777)
Daniel X. Hammer (3952427)
Erkinay Abliz (3953774)
Noah Greenbaum (3953768)
Victor Krauthamer (3953771)
Publication venue
Publication date: 27/06/2018
Field of study

We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEP). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR. Supplementary materials are available online

arXiv.org e-Print Archive

FigShare

A memory-based method to select the number of relevant components in Principal Component Analysis

Author: Di Matteo Tiziana
Verma Anshul
Vivo Pierpaolo
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

We propose a new data-driven method to select the optimal number of relevant components in Principal Component Analysis (PCA). This new method applies to correlation matrices whose time autocorrelation function decays more slowly than an exponential, giving rise to long memory effects. In comparison with other available methods present in the literature, our procedure does not rely on subjective evaluations and is computationally inexpensive. The underlying basic idea is to use a suitable factor model to analyse the residual memory after sequentially removing more and more components, and stopping the process when the maximum amount of memory has been accounted for by the retained components. We validate our methodology on both synthetic and real financial data, and find in all cases a clear and computationally superior answer entirely compatible with available heuristic criteria, such as cumulative variance and cross-validation.Comment: 29 pages, publishe

arXiv.org e-Print Archive

King's Research Portal