7,769 research outputs found
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches
Supervised Classification Using Sparse Fisher's LDA
It is well known that in a supervised classification setting when the number
of features is smaller than the number of observations, Fisher's linear
discriminant rule is asymptotically Bayes. However, there are numerous modern
applications where classification is needed in the high-dimensional setting.
Naive implementation of Fisher's rule in this case fails to provide good
results because the sample covariance matrix is singular. Moreover, by
constructing a classifier that relies on all features the interpretation of the
results is challenging. Our goal is to provide robust classification that
relies only on a small subset of important features and accounts for the
underlying correlation structure. We apply a lasso-type penalty to the
discriminant vector to ensure sparsity of the solution and use a shrinkage type
estimator for the covariance matrix. The resulting optimization problem is
solved using an iterative coordinate ascent algorithm. Furthermore, we analyze
the effect of nonconvexity on the sparsity level of the solution and highlight
the difference between the penalized and the constrained versions of the
problem. The simulation results show that the proposed method performs
favorably in comparison to alternatives. The method is used to classify
leukemia patients based on DNA methylation features
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows
acqui- sition of high dimensional datasets from which valuable information may
be extracted using multivariate statistical analyses, such as dimensionality
reduction and automatic classification (supervised and unsupervised). In this
work, a supervised classification through a partial least squares discriminant
analysis (PLS-DA) is performed on the hy- perspectral data. The obtained
results are compared with those obtained by the most commonly used
classification approaches
Discriminant analysis under the common principal components model
For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015) proposed a regularised CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations where the CPC model is applicable. This paper extends their work to the context of discriminant analysis for two groups, by plugging the regularised CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures
Protein sectors: statistical coupling analysis versus conservation
Statistical coupling analysis (SCA) is a method for analyzing multiple
sequence alignments that was used to identify groups of coevolving residues
termed "sectors". The method applies spectral analysis to a matrix obtained by
combining correlation information with sequence conservation. It has been
asserted that the protein sectors identified by SCA are functionally
significant, with different sectors controlling different biochemical
properties of the protein. Here we reconsider the available experimental data
and note that it involves almost exclusively proteins with a single sector. We
show that in this case sequence conservation is the dominating factor in SCA,
and can alone be used to make statistically equivalent functional predictions.
Therefore, we suggest shifting the experimental focus to proteins for which SCA
identifies several sectors. Correlations in protein alignments, which have been
shown to be informative in a number of independent studies, would then be less
dominated by sequence conservation.Comment: 36 pages, 17 figure
Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem
We propose a new algorithm for sparse estimation of eigenvectors in
generalized eigenvalue problems (GEP). The GEP arises in a number of modern
data-analytic situations and statistical methods, including principal component
analysis (PCA), multiclass linear discriminant analysis (LDA), canonical
correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant
co-ordinate selection. We propose to modify the standard generalized orthogonal
iteration with a sparsity-inducing penalty for the eigenvectors. To achieve
this goal, we generalize the equation-solving step of orthogonal iteration to a
penalized convex optimization problem. The resulting algorithm, called
penalized orthogonal iteration, provides accurate estimation of the true
eigenspace, when it is sparse. Also proposed is a computationally more
efficient alternative, which works well for PCA and LDA problems. Numerical
studies reveal that the proposed algorithms are competitive, and that our
tuning procedure works well. We demonstrate applications of the proposed
algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR.
Supplementary materials are available online
A memory-based method to select the number of relevant components in Principal Component Analysis
We propose a new data-driven method to select the optimal number of relevant
components in Principal Component Analysis (PCA). This new method applies to
correlation matrices whose time autocorrelation function decays more slowly
than an exponential, giving rise to long memory effects. In comparison with
other available methods present in the literature, our procedure does not rely
on subjective evaluations and is computationally inexpensive. The underlying
basic idea is to use a suitable factor model to analyse the residual memory
after sequentially removing more and more components, and stopping the process
when the maximum amount of memory has been accounted for by the retained
components. We validate our methodology on both synthetic and real financial
data, and find in all cases a clear and computationally superior answer
entirely compatible with available heuristic criteria, such as cumulative
variance and cross-validation.Comment: 29 pages, publishe
- …