1,569 research outputs found
Functional PLS logit regression model
Functional logistic regression has been developed to forecast a binary response variable from a functional predictor. In order to fit this model, it is usual to assume that the functional observations and the parameter function of the model belong to a same finite space generated by a basis of functions. This consideration turns the functional model into a multiple logit model whose design matrix is the product of the matrix of sample paths basic coefficients and the matrix of the inner products between basic functions. The likelihood estimation of the parameter function of this model is very inaccurate due to the high dependence structure of the so obtained design matrix (multicollinearity). In order to solve this drawback several approaches have been proposed. These employ standard multivariate data analysis methods on the design matrix. This is the case of the functional principal component logistic regression model. As an alternative a functional partial least squares logit regression model is proposed, that has as covariates a set of partial least squares components of the design matrix of the multiple logit model associated to the functional one.Project MTM2004-5992 from Dirección General de Investigación, Ministerio de Ciencia y Tecnologí
High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression
Motivation: The high dimensionality of genomic data calls for the development
of specific classification methodologies, especially to prevent over-optimistic
predictions. This challenge can be tackled by compression and variable
selection, which combined constitute a powerful framework for classification,
as well as data visualization and interpretation. However, current proposed
combinations lead to instable and non convergent methods due to inappropriate
computational frameworks. We hereby propose a stable and convergent approach
for classification in high dimensional based on sparse Partial Least Squares
(sparse PLS). Results: We start by proposing a new solution for the sparse PLS
problem that is based on proximal operators for the case of univariate
responses. Then we develop an adaptive version of the sparse PLS for
classification, which combines iterative optimization of logistic regression
and sparse PLS to ensure convergence and stability. Our results are confirmed
on synthetic and experimental data. In particular we show how crucial
convergence and stability can be when cross-validation is involved for
calibration purposes. Using gene expression data we explore the prediction of
breast cancer relapse. We also propose a multicategorial version of our method
on the prediction of cell-types based on single-cell expression data.
Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3
figures, 10 table
Generalized Partial Least Squares Approach for Nominal Multinomial Logit Regression Models with a Functional Covariate
Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional logistic regression (FLR) model was developed to forecast a binary response variable in the functional case. In this study, a functional nominal multinomial logit regression (F-NM-LR) model was developed that shifts the FLR model into a multiple logit model. However, the model generates inaccurate parameter function estimates due to multicollinearity in the design matrix. A generalized partial least squares (GPLS) approach with cubic B-spline basis expansions was developed to address the multicollinearity and high dimensionality problems that preclude accurate estimates and curve discrimination with the F-NM-LR model. The GPLS method extends partial least squares (PLS) and improves upon current methodology by introducing a component selection criterion that reconstructs the parameter function with fewer predictors. The GPLS regression estimates are derived via Iteratively ReWeighted Partial Least Squares (IRWPLS), defining a set of uncorrelated latent variables to use as predictors for the F-GPLS-NM-LR model. This methodology was compared to the classic alternative estimation method of principal component regression (PCR) in a simulation study. The performance of the proposed methodology was tested via simulations and applications on a spectrometric dataset. The results indicate that the GPLS method performs well in multi-class prediction with respect to the F-NM-LR model. The main difference between the two approaches was that PCR usually requires more components than GPLS to achieve similar accuracy of parameter function estimates of the F-GPLS-NM-LR model. The results of this research imply that the GPLS method is preferable to the F-NM-LR model, and it is a useful contribution to FDA techniques. This method may be particularly appropriate for practical situations where accurate prediction of a response variable with fewer components is a priority
Methodology and theory for partial least squares applied to functional data
The partial least squares procedure was originally developed to estimate the
slope parameter in multivariate parametric models. More recently it has gained
popularity in the functional data literature. There, the partial least squares
estimator of slope is either used to construct linear predictive models, or as
a tool to project the data onto a one-dimensional quantity that is employed for
further statistical analysis. Although the partial least squares approach is
often viewed as an attractive alternative to projections onto the principal
component basis, its properties are less well known than those of the latter,
mainly because of its iterative nature. We develop an explicit formulation of
partial least squares for functional data, which leads to insightful results
and motivates new theory, demonstrating consistency and establishing
convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Assessing multivariate predictors of financial market movements: A latent factor framework for ordinal data
Much of the trading activity in Equity markets is directed to brokerage
houses. In exchange they provide so-called "soft dollars," which basically are
amounts spent in "research" for identifying profitable trading opportunities.
Soft dollars represent about USD 1 out of every USD 10 paid in commissions.
Obviously they are costly, and it is interesting for an institutional investor
to determine whether soft dollar inputs are worth being used (and indirectly
paid for) or not, from a statistical point of view. To address this question,
we develop association measures between what broker--dealers predict and what
markets realize. Our data are ordinal predictions by two broker--dealers and
realized values on several markets, on the same ordinal scale. We develop a
structural equation model with latent variables in an ordinal setting which
allows us to test broker--dealer predictive ability of financial market
movements. We use a multivariate logit model in a latent factor framework,
develop a tractable estimator based on a Laplace approximation, and show its
consistency and asymptotic normality. Monte Carlo experiments reveal that both
the estimation method and the testing procedure perform well in small samples.
The method is then used to analyze our dataset.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS213 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Forecasting binary longitudinal data by a functional PC-ARIMA model
In order to forecast time evolution of a binary response variable from a related continuous time series a functional logit model is proposed. The estimation of this model from discrete time observations of the predictor is solved by using functional principal component analysis and ARIMA modelling of the associated discrete time series of principal components. The proposed model is applied to forecast the risk of drought from El Niño phenomenon.Projects MTM2007-63793 from Dirección General de Investigación, Ministerio de Educación y Ciencia, Spain and P06-FQM-01470 from Consejería de Innovación Ciencia y Empresa, Junta de Andalucía, Spai
- …