26 research outputs found
Comparative study of different B-spline approaches for functional data
The sample observations of a functional variable are functions that come from the
observation of a statistical variable in a continuous argument that in most cases is the
time. But in practice, the sample functions are observed in a finite set of points. Then, the
first step in functional data analysis is to reconstruct the functional form of sample curves
from discrete observations. The sample curves are usually represented in terms of basis
functions and the basis coefficients are fitted by interpolation, when data are observed
without error, or by least squares approximation, in the other case. The main purpose of
this paper is to compare three different approaches for estimating smooth sample curves
observed with error in terms of B-spline basis: regression splines (non-penalized least
squares approximation), smoothing splines (continuous roughness penalty) and P-splines
(discrete roughness penalty). The performance of these spline smoothing approaches is
studied via a simulation study and several applications with real data. Cross-validation and
generalized cross-validation are adapted to select a common smoothing parameter for all
sample curves with the roughness penalty approaches. From the results, it is concluded
that both penalized approaches drastically reduced the mean squared errors with respect
to the original smooth sample curves with P-splines giving the best approximations with
less computational cost.Project MTM2010-20502 from Dirección General de Investigación, Ministerio de Educación y Ciencia SpainProject P11-FQM-8068 from Consejería de Innovación, Ciencia y Empresa. Junta de Andalucía, Spai
Prediction of functional data with spatial dependence: a penalized approach
This paper is focus on spatial functional variables whose observations are a set of spatially correlated
sample curves obtained as realizations of a spatio-temporal
stochastic process. In this context, as alternative to other
geostatistical techniques (kriging, kernel smoothing,
among others), a new method to predict the curves of
temporal evolution of the process at unsampled locations
and also the surfaces of geographical evolution of the
variable at unobserved time points is proposed. In order to
test the good performance of the proposed method, two
simulation studies and an application with real climatological data have been carried out. Finally, the results were
compared with ordinary functional kriging.Project P11-FQM-8068 from Consejería de Innovación, Ciencia y Empresa, Junta de Andalucía, SpainProjects MTM2013-47929-P, MTM2011-28285-C02-C2 and MTM 2014-52184-P from Secretaría de
Estado Investigación, Desarrollo e Innovación, Ministerio de Economía y Competitividad, Spai
Functional PCA and Base-Line Logit Models
In many statistical applications data are curves measured as functions of a
continuous parameter as time. Despite of their functional nature and due to discrete time observation, these type of data are usually analyzed with multivariate statistical
methods that do not take into account the high correlation between observations of a
single curve at nearby time points. Functional data analysis methodologies have been
developed to solve these type of problems. In order to predict the class membership
(multi-category response variable) associated to an observed curve (functional data),
a functional generalized logit model is proposed. Base-line category logit formula-
tions will be considered and their estimation based on basis expansions of the sample
curves of the functional predictor and parameters. Functional principal component
analysis will be used to get an accurate estimation of the functional parameters and
to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental
study with simulated and real data.Projects MTM2010-20502 from Dirección General de Investigación del MEC SpainFQM-08068 from Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía Spai
On the estimation of functional random effects
Functional regression modelling has become one of the most vibrant areas of research in the last years. This discussion provides some alternative approaches to one of the key issues of functional data analysis: the basis representation of curves, and in particular, of functional random effects. First, we propose the estimation of functional principal components by penalizing the norm, and as an alternative, we provide an efficient and unified approach based on B-spline basis and quadratic penalties
Penalized function-on-function partial leastsquares regression
This paper deals with the "function-on-function'" or "fully functional" linear regression problem. We address the problem by proposing a novel penalized Function-on-Function Partial Least-Squares (pFFPLS) approach that imposes smoothness on the PLS weights. Our proposal introduces an appropriate finite-dimensional functional space with an associated set of bases on which to represent the data and controls smoothness with a roughness penalty operator. Penalizing the PLS weights imposes smoothness on the resulting coefficient function, improving its interpretability. In a simulation study, we demonstrate the advantages of pFFPLS compared to non-penalized FFPLS. Our comparisons indicate a higher accuracy of pFFPLS when predicting the response and estimating the true coefficient function from which the data were generated. We also illustrate the advantages of our proposal with two case studies involving two well-known datasets from the functional data analysis literature. In the first one, we predict log precipitation curves from the yearly temperature profiles recorded in 35 weather stations in Canada. In the second case study, we predict the hip angle profiles during a gait cycle of children from their corresponding knee angle profiles
A quantile based dimension reduction technique
Partial least squares (PLS) is a dimensionality reduction technique used as an alternative to ordinary least squares (OLS) in situations where the data is colinear or high dimensional. Both PLS and OLS provide mean based estimates, which are extremely sensitive to the presence of outliers or heavy tailed distributions. In contrast, quantile regression is an alternative to OLS that computes robust quantile based estimates. In this work, the multivariate PLS is extended to the quantile regression framework, obtaining a theoretical formulation of the problem and a robust dimensionality reduction technique that we call fast partial quantile regression (fPQR), that provides quantilebased estimates. An efficient implementation of fPQR is also derived, and its performance is studied through simulation experiments and the chemometrics well known biscuit dough dataset, a real high dimensional example
Quantile regression : a penalization approach
Sparse group LASSO (SGL) is a penalization technique used in regression problems where the covariates have a natural grouped structure and provides solutions that are both between and within group sparse. In this paper the SGL is introduced to the quantile regression (QR) framework, and a more flexible version, the adaptive sparse group LASSO (ASGL), is proposed. This proposal adds weights to the penalization improving prediction accuracy. Usually, adaptive weights are taken as a function of the original non-penalized solution model. This approach is only feasible in the n > p framework. In this work, a solution that allows using adaptive weights in high-dimensional scenarios is proposed. The benefits of this proposal are studied both in synthetic and real datasets.In this research we have made use of Uranus, a supercomputer cluster located
at University Carlos III of Madrid and funded jointly by EU-FEDER funds
and by the Spanish Government via the National Projects No. UNC313-4E-
2361, No. ENE2009-12213- C03-03, No. ENE2012-33219 and No. ENE2015-
68265-P. This research was partially supported by research grants and Project
ECO2015-66593-P from Ministerio de Economía, Industria y Competitividad,
Project MTM2017-88708-P from Ministerio de Economía y Competi-
tividad, FEDER funds and Project IJCI-2017-34038 from Agencia Estatal
de Investigación, Ministerio de Ciencia, Innovación y Universidades
Stepwise selection of functional covariates in forecasting peak levels of olive pollen
High levels of airborne olive pollen represent a
problem for a large proportion of the population because of
the many allergies it causes. Many attempts have been
made to forecast the concentration of airborne olive pollen,
using methods such as time series, linear regression, neural
networks, a combination of fuzzy systems and neural networks, and functional models. This paper presents a
functional logistic regression model used to study the
relationship between olive pollen concentration and different climatic factors, and on this basis to predict the
probability of high (and possibly extreme) levels of airborne pollen, selecting the best subset of functional climatic variables by means of a stepwise method based on
the conditional likelihood ratio test.Projects MTM2010-20502 from Dirección General de Investigación del MEC, Spain and FQM-307 from Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía Spai
Advanced Statistical Techniques for Noninvasive Hyperglycemic States Detection in Mice Using Millimeter-Wave Spectroscopy
In this article, we discuss the use of advanced statistical techniques (functional data analysis) in millimeter-wave (mm-wave) spectroscopy for biomedical applications. We employ a W-band transmit-receive unit with a reference channel to acquire spectral data. The choice of the W-band is based on a tradeoff between penetration through the skin providing an upper bound for the frequencies and spectral content across the band. The data obtained are processed using functional principal component logit regression (FPCLoR), which enables to obtain a predictive model for sustained hyperglycemia, typically associated with diabetes. The predictions are based on the transmission data from noninvasive mm-wave spectrometer at W-band. We show that there exists a frequency range most suitable for identification, classification, and prediction of sustained hyperglycemia when evaluating the functional parameter of the functional logit model (β). This allows for the optimization of the spectroscopic instrument in the aim to obtain a compact and potential low-cost noninvasive instrument for hyperglycemia assessment. Furthermore, we also demonstrate that the statistical tools alleviate the problem of calibration, which is a serious obstacle in similar measurements at terahertz and IR frequencies
Iterative variable selection for high-dimensional data: prediction of pathological response in triple-negative breast cancer
In the last decade, regularized regression methods have offered alternatives forperforming multi-marker analysis and feature selection in a whole genome context.The process of defining a list of genes that will characterize an expressionprofile, remains unclear. This procedure oscillates between selecting the genes or transcripts of interest based on previous clinical evidence, or performing a whole transcriptome analys is that rests on advanced statistics. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data, and a real dataset from a triple negative breast cancer study