1,569 research outputs found

    Functional PLS logit regression model

    Get PDF
    Functional logistic regression has been developed to forecast a binary response variable from a functional predictor. In order to fit this model, it is usual to assume that the functional observations and the parameter function of the model belong to a same finite space generated by a basis of functions. This consideration turns the functional model into a multiple logit model whose design matrix is the product of the matrix of sample paths basic coefficients and the matrix of the inner products between basic functions. The likelihood estimation of the parameter function of this model is very inaccurate due to the high dependence structure of the so obtained design matrix (multicollinearity). In order to solve this drawback several approaches have been proposed. These employ standard multivariate data analysis methods on the design matrix. This is the case of the functional principal component logistic regression model. As an alternative a functional partial least squares logit regression model is proposed, that has as covariates a set of partial least squares components of the design matrix of the multiple logit model associated to the functional one.Project MTM2004-5992 from Dirección General de Investigación, Ministerio de Ciencia y Tecnologí

    High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression

    Get PDF
    Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which combined constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to instable and non convergent methods due to inappropriate computational frameworks. We hereby propose a stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS). Results: We start by proposing a new solution for the sparse PLS problem that is based on proximal operators for the case of univariate responses. Then we develop an adaptive version of the sparse PLS for classification, which combines iterative optimization of logistic regression and sparse PLS to ensure convergence and stability. Our results are confirmed on synthetic and experimental data. In particular we show how crucial convergence and stability can be when cross-validation is involved for calibration purposes. Using gene expression data we explore the prediction of breast cancer relapse. We also propose a multicategorial version of our method on the prediction of cell-types based on single-cell expression data. Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3 figures, 10 table

    Generalized Partial Least Squares Approach for Nominal Multinomial Logit Regression Models with a Functional Covariate

    Get PDF
    Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional logistic regression (FLR) model was developed to forecast a binary response variable in the functional case. In this study, a functional nominal multinomial logit regression (F-NM-LR) model was developed that shifts the FLR model into a multiple logit model. However, the model generates inaccurate parameter function estimates due to multicollinearity in the design matrix. A generalized partial least squares (GPLS) approach with cubic B-spline basis expansions was developed to address the multicollinearity and high dimensionality problems that preclude accurate estimates and curve discrimination with the F-NM-LR model. The GPLS method extends partial least squares (PLS) and improves upon current methodology by introducing a component selection criterion that reconstructs the parameter function with fewer predictors. The GPLS regression estimates are derived via Iteratively ReWeighted Partial Least Squares (IRWPLS), defining a set of uncorrelated latent variables to use as predictors for the F-GPLS-NM-LR model. This methodology was compared to the classic alternative estimation method of principal component regression (PCR) in a simulation study. The performance of the proposed methodology was tested via simulations and applications on a spectrometric dataset. The results indicate that the GPLS method performs well in multi-class prediction with respect to the F-NM-LR model. The main difference between the two approaches was that PCR usually requires more components than GPLS to achieve similar accuracy of parameter function estimates of the F-GPLS-NM-LR model. The results of this research imply that the GPLS method is preferable to the F-NM-LR model, and it is a useful contribution to FDA techniques. This method may be particularly appropriate for practical situations where accurate prediction of a response variable with fewer components is a priority

    Methodology and theory for partial least squares applied to functional data

    Full text link
    The partial least squares procedure was originally developed to estimate the slope parameter in multivariate parametric models. More recently it has gained popularity in the functional data literature. There, the partial least squares estimator of slope is either used to construct linear predictive models, or as a tool to project the data onto a one-dimensional quantity that is employed for further statistical analysis. Although the partial least squares approach is often viewed as an attractive alternative to projections onto the principal component basis, its properties are less well known than those of the latter, mainly because of its iterative nature. We develop an explicit formulation of partial least squares for functional data, which leads to insightful results and motivates new theory, demonstrating consistency and establishing convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Assessing multivariate predictors of financial market movements: A latent factor framework for ordinal data

    Full text link
    Much of the trading activity in Equity markets is directed to brokerage houses. In exchange they provide so-called "soft dollars," which basically are amounts spent in "research" for identifying profitable trading opportunities. Soft dollars represent about USD 1 out of every USD 10 paid in commissions. Obviously they are costly, and it is interesting for an institutional investor to determine whether soft dollar inputs are worth being used (and indirectly paid for) or not, from a statistical point of view. To address this question, we develop association measures between what broker--dealers predict and what markets realize. Our data are ordinal predictions by two broker--dealers and realized values on several markets, on the same ordinal scale. We develop a structural equation model with latent variables in an ordinal setting which allows us to test broker--dealer predictive ability of financial market movements. We use a multivariate logit model in a latent factor framework, develop a tractable estimator based on a Laplace approximation, and show its consistency and asymptotic normality. Monte Carlo experiments reveal that both the estimation method and the testing procedure perform well in small samples. The method is then used to analyze our dataset.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS213 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Forecasting binary longitudinal data by a functional PC-ARIMA model

    Get PDF
    In order to forecast time evolution of a binary response variable from a related continuous time series a functional logit model is proposed. The estimation of this model from discrete time observations of the predictor is solved by using functional principal component analysis and ARIMA modelling of the associated discrete time series of principal components. The proposed model is applied to forecast the risk of drought from El Niño phenomenon.Projects MTM2007-63793 from Dirección General de Investigación, Ministerio de Educación y Ciencia, Spain and P06-FQM-01470 from Consejería de Innovación Ciencia y Empresa, Junta de Andalucía, Spai