298 research outputs found
A goodness-of-fit test for the functional linear model with scalar response
In this work, a goodness-of-fit test for the null hypothesis of a functional
linear model with scalar response is proposed. The test is based on a
generalization to the functional framework of a previous one, designed for the
goodness-of-fit of regression models with multivariate covariates using random
projections. The test statistic is easy to compute using geometrical and matrix
arguments, and simple to calibrate in its distribution by a wild bootstrap on
the residuals. The finite sample properties of the test are illustrated by a
simulation study for several types of basis and under different alternatives.
Finally, the test is applied to two datasets for checking the assumption of the
functional linear model and a graphical tool is introduced. Supplementary
materials are available online.Comment: Paper: 17 pages, 2 figures, 3 tables. Supplementary material: 8
pages, 6 figures, 10 table
Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian-Wishart processes
Functional data are defined as realizations of random functions (mostly
smooth functions) varying over a continuum, which are usually collected with
measurement errors on discretized grids. In order to accurately smooth noisy
functional observations and deal with the issue of high-dimensional observation
grids, we propose a novel Bayesian method based on the Bayesian hierarchical
model with a Gaussian-Wishart process prior and basis function representations.
We first derive an induced model for the basis-function coefficients of the
functional data, and then use this model to conduct posterior inference through
Markov chain Monte Carlo. Compared to the standard Bayesian inference that
suffers serious computational burden and unstableness for analyzing
high-dimensional functional data, our method greatly improves the computational
scalability and stability, while inheriting the advantage of simultaneously
smoothing raw observations and estimating the mean-covariance functions in a
nonparametric way. In addition, our method can naturally handle functional data
observed on random or uncommon grids. Simulation and real studies demonstrate
that our method produces similar results as the standard Bayesian inference
with low-dimensional common grids, while efficiently smoothing and estimating
functional data with random and high-dimensional observation grids where the
standard Bayesian inference fails. In conclusion, our method can efficiently
smooth and estimate high-dimensional functional data, providing one way to
resolve the curse of dimensionality for Bayesian functional data analysis with
Gaussian-Wishart processes.Comment: Under revie
Functional kernel estimators of conditional extreme quantiles
We address the estimation of "extreme" conditional quantiles i.e. when their
order converges to one as the sample size increases. Conditions on the rate of
convergence of their order to one are provided to obtain asymptotically
Gaussian distributed kernel estimators. A Weissman-type estimator and kernel
estimators of the conditional tail-index are derived, permitting to estimate
extreme conditional quantiles of arbitrary order.Comment: arXiv admin note: text overlap with arXiv:1107.226
Approximating nonequilibrium processes using a collection of surrogate diffusion models
The surrogate process approximation (SPA) is applied to model the
nonequilibrium dynamics of a reaction coordinate (RC) associated with the
unfolding and refolding processes of a deca-alanine peptide at 300 K. The RC
dynamics, which correspond to the evolution of the end-to-end distance of the
polypeptide, are produced by steered molecular dynamics (SMD) simulations and
approximated using overdamped diffusion models. We show that the collection of
(estimated) SPA models contain structural information "orthogonal" to the RC
monitored in this study. Functional data analysis ideas are used to correlate
functions associated with the fitted SPA models with the work done on the
system in SMD simulations. It is demonstrated that the shape of the
nonequilibrium work distributions for the unfolding and refolding processes of
deca-alanine can be predicted with functional data analysis ideas using a
relatively small number of simulated SMD paths for calibrating the SPA
diffusion models.Comment: 13 pages, 7 figure
Forecasting basketball players’ performance using sparse functional data
Statistics and analytic methods are becoming increasingly important in basketball. In particular, predicting players’ performance using past observations is a considerable challenge. The purpose of this study is to forecast
the future behavior of basketball players. The available data are sparse functional data, which are very common in sports. So far, however, no forecasting method designed for sparse functional data has been used in sports.
A methodology based on two methods to handle sparse and irregular data,
together with the analogous method and functional archetypoid analysis is
proposed. Results in comparison with traditional methods show that our
approach is competitive and additionally provides prediction intervals. The
methodology can also be used in other sports when sparse longitudinal data
are available
Adaptive estimation in circular functional linear models
We consider the problem of estimating the slope parameter in circular
functional linear regression, where scalar responses Y1,...,Yn are modeled in
dependence of 1-periodic, second order stationary random functions X1,...,Xn.
We consider an orthogonal series estimator of the slope function, by replacing
the first m theoretical coefficients of its development in the trigonometric
basis by adequate estimators. Wepropose a model selection procedure for m in a
set of admissible values, by defining a contrast function minimized by our
estimator and a theoretical penalty function; this first step assumes the
degree of ill posedness to be known. Then we generalize the procedure to a
random set of admissible m's and a random penalty function. The resulting
estimator is completely data driven and reaches automatically what is known to
be the optimal minimax rate of convergence, in term of a general weighted
L2-risk. This means that we provide adaptive estimators of both the slope
function and its derivatives
Model-Based Clustering and Classification of Functional Data
The problem of complex data analysis is a central topic of modern statistical
science and learning systems and is becoming of broader interest with the
increasing prevalence of high-dimensional data. The challenge is to develop
statistical models and autonomous algorithms that are able to acquire knowledge
from raw data for exploratory analysis, which can be achieved through
clustering techniques or to make predictions of future data via classification
(i.e., discriminant analysis) techniques. Latent data models, including mixture
model-based approaches are one of the most popular and successful approaches in
both the unsupervised context (i.e., clustering) and the supervised one (i.e,
classification or discrimination). Although traditionally tools of multivariate
analysis, they are growing in popularity when considered in the framework of
functional data analysis (FDA). FDA is the data analysis paradigm in which the
individual data units are functions (e.g., curves, surfaces), rather than
simple vectors. In many areas of application, the analyzed data are indeed
often available in the form of discretized values of functions or curves (e.g.,
time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data).
This functional aspect of the data adds additional difficulties compared to the
case of a classical multivariate (non-functional) data analysis. We review and
present approaches for model-based clustering and classification of functional
data. We derive well-established statistical models along with efficient
algorithmic tools to address problems regarding the clustering and the
classification of these high-dimensional data, including their heterogeneity,
missing information, and dynamical hidden structure. The presented models and
algorithms are illustrated on real-world functional data analysis problems from
several application area
Lazy Lasso for local regression
Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenario
- …