117,381 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Selective machine learning of doubly robust functionals
While model selection is a well-studied topic in parametric and nonparametric
regression or density estimation, selection of possibly high-dimensional
nuisance parameters in semiparametric problems is far less developed. In this
paper, we propose a selective machine learning framework for making inferences
about a finite-dimensional functional defined on a semiparametric model, when
the latter admits a doubly robust estimating function and several candidate
machine learning algorithms are available for estimating the nuisance
parameters. We introduce two new selection criteria for bias reduction in
estimating the functional of interest, each based on a novel definition of
pseudo-risk for the functional that embodies the double robustness property and
thus is used to select the pair of learners that is nearest to fulfilling this
property. We establish an oracle property for a multi-fold cross-validation
version of the new selection criteria which states that our empirical criteria
perform nearly as well as an oracle with a priori knowledge of the pseudo-risk
for each pair of candidate learners. We also describe a smooth approximation to
the selection criteria which allows for valid post-selection inference.
Finally, we apply the approach to model selection of a semiparametric estimator
of average treatment effect given an ensemble of candidate machine learners to
account for confounding in an observational study
Optimal designs for random effect models with correlated errors with applications in population pharmacokinetics
We consider the problem of constructing optimal designs for population
pharmacokinetics which use random effect models. It is common practice in the
design of experiments in such studies to assume uncorrelated errors for each
subject. In the present paper a new approach is introduced to determine
efficient designs for nonlinear least squares estimation which addresses the
problem of correlation between observations corresponding to the same subject.
We use asymptotic arguments to derive optimal design densities, and the designs
for finite sample sizes are constructed from the quantiles of the corresponding
optimal distribution function. It is demonstrated that compared to the optimal
exact designs, whose determination is a hard numerical problem, these designs
are very efficient. Alternatively, the designs derived from asymptotic theory
could be used as starting designs for the numerical computation of exact
optimal designs. Several examples of linear and nonlinear models are presented
in order to illustrate the methodology. In particular, it is demonstrated that
naively chosen equally spaced designs may lead to less accurate estimation.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS324 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multilevel functional principal component analysis
The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of
sleep and its impacts on health outcomes. A primary metric of the SHHS is the
in-home polysomnogram, which includes two electroencephalographic (EEG)
channels for each subject, at two visits. The volume and importance of this
data presents enormous challenges for analysis. To address these challenges, we
introduce multilevel functional principal component analysis (MFPCA), a novel
statistical methodology designed to extract core intra- and inter-subject
geometric components of multilevel functional data. Though motivated by the
SHHS, the proposed methodology is generally applicable, with potential
relevance to many modern scientific studies of hierarchical or longitudinal
functional outcomes. Notably, using MFPCA, we identify and quantify
associations between EEG activity during sleep and adverse cardiovascular
outcomes.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS206 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Generalized Functional Additive Mixed Models
We propose a comprehensive framework for additive regression models for
non-Gaussian functional responses, allowing for multiple (partially) nested or
crossed functional random effects with flexible correlation structures for,
e.g., spatial, temporal, or longitudinal functional data as well as linear and
nonlinear effects of functional and scalar covariates that may vary smoothly
over the index of the functional response. Our implementation handles
functional responses from any exponential family distribution as well as many
others like Beta- or scaled non-central -distributions. Development is
motivated by and evaluated on an application to large-scale longitudinal
feeding records of pigs. Results in extensive simulation studies as well as
replications of two previously published simulation studies for generalized
functional mixed models demonstrate the good performance of our proposal. The
approach is implemented in well-documented open source software in the "pffr()"
function in R-package "refund"
- …