26 research outputs found

    Comparative study of different B-spline approaches for functional data

    Get PDF
    The sample observations of a functional variable are functions that come from the observation of a statistical variable in a continuous argument that in most cases is the time. But in practice, the sample functions are observed in a finite set of points. Then, the first step in functional data analysis is to reconstruct the functional form of sample curves from discrete observations. The sample curves are usually represented in terms of basis functions and the basis coefficients are fitted by interpolation, when data are observed without error, or by least squares approximation, in the other case. The main purpose of this paper is to compare three different approaches for estimating smooth sample curves observed with error in terms of B-spline basis: regression splines (non-penalized least squares approximation), smoothing splines (continuous roughness penalty) and P-splines (discrete roughness penalty). The performance of these spline smoothing approaches is studied via a simulation study and several applications with real data. Cross-validation and generalized cross-validation are adapted to select a common smoothing parameter for all sample curves with the roughness penalty approaches. From the results, it is concluded that both penalized approaches drastically reduced the mean squared errors with respect to the original smooth sample curves with P-splines giving the best approximations with less computational cost.Project MTM2010-20502 from Dirección General de Investigación, Ministerio de Educación y Ciencia SpainProject P11-FQM-8068 from Consejería de Innovación, Ciencia y Empresa. Junta de Andalucía, Spai

    Prediction of functional data with spatial dependence: a penalized approach

    Get PDF
    This paper is focus on spatial functional variables whose observations are a set of spatially correlated sample curves obtained as realizations of a spatio-temporal stochastic process. In this context, as alternative to other geostatistical techniques (kriging, kernel smoothing, among others), a new method to predict the curves of temporal evolution of the process at unsampled locations and also the surfaces of geographical evolution of the variable at unobserved time points is proposed. In order to test the good performance of the proposed method, two simulation studies and an application with real climatological data have been carried out. Finally, the results were compared with ordinary functional kriging.Project P11-FQM-8068 from Consejería de Innovación, Ciencia y Empresa, Junta de Andalucía, SpainProjects MTM2013-47929-P, MTM2011-28285-C02-C2 and MTM 2014-52184-P from Secretaría de Estado Investigación, Desarrollo e Innovación, Ministerio de Economía y Competitividad, Spai

    Functional PCA and Base-Line Logit Models

    Get PDF
    In many statistical applications data are curves measured as functions of a continuous parameter as time. Despite of their functional nature and due to discrete time observation, these type of data are usually analyzed with multivariate statistical methods that do not take into account the high correlation between observations of a single curve at nearby time points. Functional data analysis methodologies have been developed to solve these type of problems. In order to predict the class membership (multi-category response variable) associated to an observed curve (functional data), a functional generalized logit model is proposed. Base-line category logit formula- tions will be considered and their estimation based on basis expansions of the sample curves of the functional predictor and parameters. Functional principal component analysis will be used to get an accurate estimation of the functional parameters and to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental study with simulated and real data.Projects MTM2010-20502 from Dirección General de Investigación del MEC SpainFQM-08068 from Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía Spai

    On the estimation of functional random effects

    Get PDF
    Functional regression modelling has become one of the most vibrant areas of research in the last years. This discussion provides some alternative approaches to one of the key issues of functional data analysis: the basis representation of curves, and in particular, of functional random effects. First, we propose the estimation of functional principal components by penalizing the norm, and as an alternative, we provide an efficient and unified approach based on B-spline basis and quadratic penalties

    Penalized function-on-function partial leastsquares regression

    Get PDF
    This paper deals with the "function-on-function'" or "fully functional" linear regression problem. We address the problem by proposing a novel penalized Function-on-Function Partial Least-Squares (pFFPLS) approach that imposes smoothness on the PLS weights. Our proposal introduces an appropriate finite-dimensional functional space with an associated set of bases on which to represent the data and controls smoothness with a roughness penalty operator. Penalizing the PLS weights imposes smoothness on the resulting coefficient function, improving its interpretability. In a simulation study, we demonstrate the advantages of pFFPLS compared to non-penalized FFPLS. Our comparisons indicate a higher accuracy of pFFPLS when predicting the response and estimating the true coefficient function from which the data were generated. We also illustrate the advantages of our proposal with two case studies involving two well-known datasets from the functional data analysis literature. In the first one, we predict log precipitation curves from the yearly temperature profiles recorded in 35 weather stations in Canada. In the second case study, we predict the hip angle profiles during a gait cycle of children from their corresponding knee angle profiles

    A quantile based dimension reduction technique

    Get PDF
    Partial least squares (PLS) is a dimensionality reduction technique used as an alternative to ordinary least squares (OLS) in situations where the data is colinear or high dimensional. Both PLS and OLS provide mean based estimates, which are extremely sensitive to the presence of outliers or heavy tailed distributions. In contrast, quantile regression is an alternative to OLS that computes robust quantile based estimates. In this work, the multivariate PLS is extended to the quantile regression framework, obtaining a theoretical formulation of the problem and a robust dimensionality reduction technique that we call fast partial quantile regression (fPQR), that provides quantilebased estimates. An efficient implementation of fPQR is also derived, and its performance is studied through simulation experiments and the chemometrics well known biscuit dough dataset, a real high dimensional example

    Quantile regression : a penalization approach

    Get PDF
    Sparse group LASSO (SGL) is a penalization technique used in regression problems where the covariates have a natural grouped structure and provides solutions that are both between and within group sparse. In this paper the SGL is introduced to the quantile regression (QR) framework, and a more flexible version, the adaptive sparse group LASSO (ASGL), is proposed. This proposal adds weights to the penalization improving prediction accuracy. Usually, adaptive weights are taken as a function of the original non-penalized solution model. This approach is only feasible in the n > p framework. In this work, a solution that allows using adaptive weights in high-dimensional scenarios is proposed. The benefits of this proposal are studied both in synthetic and real datasets.In this research we have made use of Uranus, a supercomputer cluster located at University Carlos III of Madrid and funded jointly by EU-FEDER funds and by the Spanish Government via the National Projects No. UNC313-4E- 2361, No. ENE2009-12213- C03-03, No. ENE2012-33219 and No. ENE2015- 68265-P. This research was partially supported by research grants and Project ECO2015-66593-P from Ministerio de Economía, Industria y Competitividad, Project MTM2017-88708-P from Ministerio de Economía y Competi- tividad, FEDER funds and Project IJCI-2017-34038 from Agencia Estatal de Investigación, Ministerio de Ciencia, Innovación y Universidades

    Stepwise selection of functional covariates in forecasting peak levels of olive pollen

    Get PDF
    High levels of airborne olive pollen represent a problem for a large proportion of the population because of the many allergies it causes. Many attempts have been made to forecast the concentration of airborne olive pollen, using methods such as time series, linear regression, neural networks, a combination of fuzzy systems and neural networks, and functional models. This paper presents a functional logistic regression model used to study the relationship between olive pollen concentration and different climatic factors, and on this basis to predict the probability of high (and possibly extreme) levels of airborne pollen, selecting the best subset of functional climatic variables by means of a stepwise method based on the conditional likelihood ratio test.Projects MTM2010-20502 from Dirección General de Investigación del MEC, Spain and FQM-307 from Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía Spai

    Advanced Statistical Techniques for Noninvasive Hyperglycemic States Detection in Mice Using Millimeter-Wave Spectroscopy

    Get PDF
    In this article, we discuss the use of advanced statistical techniques (functional data analysis) in millimeter-wave (mm-wave) spectroscopy for biomedical applications. We employ a W-band transmit-receive unit with a reference channel to acquire spectral data. The choice of the W-band is based on a tradeoff between penetration through the skin providing an upper bound for the frequencies and spectral content across the band. The data obtained are processed using functional principal component logit regression (FPCLoR), which enables to obtain a predictive model for sustained hyperglycemia, typically associated with diabetes. The predictions are based on the transmission data from noninvasive mm-wave spectrometer at W-band. We show that there exists a frequency range most suitable for identification, classification, and prediction of sustained hyperglycemia when evaluating the functional parameter of the functional logit model (β). This allows for the optimization of the spectroscopic instrument in the aim to obtain a compact and potential low-cost noninvasive instrument for hyperglycemia assessment. Furthermore, we also demonstrate that the statistical tools alleviate the problem of calibration, which is a serious obstacle in similar measurements at terahertz and IR frequencies

    Iterative variable selection for high-dimensional data: prediction of pathological response in triple-negative breast cancer

    Get PDF
    In the last decade, regularized regression methods have offered alternatives forperforming multi-marker analysis and feature selection in a whole genome context.The process of defining a list of genes that will characterize an expressionprofile, remains unclear. This procedure oscillates between selecting the genes or transcripts of interest based on previous clinical evidence, or performing a whole transcriptome analys is that rests on advanced statistics. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data, and a real dataset from a triple negative breast cancer study
    corecore