53,525 research outputs found
High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression
Motivation: The high dimensionality of genomic data calls for the development
of specific classification methodologies, especially to prevent over-optimistic
predictions. This challenge can be tackled by compression and variable
selection, which combined constitute a powerful framework for classification,
as well as data visualization and interpretation. However, current proposed
combinations lead to instable and non convergent methods due to inappropriate
computational frameworks. We hereby propose a stable and convergent approach
for classification in high dimensional based on sparse Partial Least Squares
(sparse PLS). Results: We start by proposing a new solution for the sparse PLS
problem that is based on proximal operators for the case of univariate
responses. Then we develop an adaptive version of the sparse PLS for
classification, which combines iterative optimization of logistic regression
and sparse PLS to ensure convergence and stability. Our results are confirmed
on synthetic and experimental data. In particular we show how crucial
convergence and stability can be when cross-validation is involved for
calibration purposes. Using gene expression data we explore the prediction of
breast cancer relapse. We also propose a multicategorial version of our method
on the prediction of cell-types based on single-cell expression data.
Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3
figures, 10 table
- …