521 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Projected principal component analysis in factor models
This paper introduces a Projected Principal Component Analysis
(Projected-PCA), which employs principal component analysis to the projected
(smoothed) data matrix onto a given linear space spanned by covariates. When it
applies to high-dimensional factor analysis, the projection removes noise
components. We show that the unobserved latent factors can be more accurately
estimated than the conventional PCA if the projection is genuine, or more
precisely, when the factor loading matrices are related to the projected linear
space. When the dimensionality is large, the factors can be estimated
accurately even when the sample size is finite. We propose a flexible
semiparametric factor model, which decomposes the factor loading matrix into
the component that can be explained by subject-specific covariates and the
orthogonal residual component. The covariates' effects on the factor loadings
are further modeled by the additive model via sieve approximations. By using
the newly proposed Projected-PCA, the rates of convergence of the smooth factor
loading matrices are obtained, which are much faster than those of the
conventional factor analysis. The convergence is achieved even when the sample
size is finite and is particularly appealing in the
high-dimension-low-sample-size situation. This leads us to developing
nonparametric tests on whether observed covariates have explaining powers on
the loadings and whether they fully explain the loadings. The proposed method
is illustrated by both simulated data and the returns of the components of the
S&P 500 index.Comment: Published at http://dx.doi.org/10.1214/15-AOS1364 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data
We propose an estimation approach to analyse correlated functional data which
are observed on unequal grids or even sparsely. The model we use is a
functional linear mixed model, a functional analogue of the linear mixed model.
Estimation is based on dimension reduction via functional principal component
analysis and on mixed model methodology. Our procedure allows the decomposition
of the variability in the data as well as the estimation of mean effects of
interest and borrows strength across curves. Confidence bands for mean effects
can be constructed conditional on estimated principal components. We provide
R-code implementing our approach. The method is motivated by and applied to
data from speech production research
Covariate adjusted functional principal components analysis for longitudinal data
Classical multivariate principal component analysis has been extended to
functional data and termed functional principal component analysis (FPCA). Most
existing FPCA approaches do not accommodate covariate information, and it is
the goal of this paper to develop two methods that do. In the first approach,
both the mean and covariance functions depend on the covariate and time
scale while in the second approach only the mean function depends on the
covariate . Both new approaches accommodate additional measurement errors
and functional data sampled at regular time grids as well as sparse
longitudinal data sampled at irregular time grids. The first approach to fully
adjust both the mean and covariance functions adapts more to the data but is
computationally more intensive than the approach to adjust the covariate
effects on the mean function only. We develop general asymptotic theory for
both approaches and compare their performance numerically through simulation
studies and a data set.Comment: Published in at http://dx.doi.org/10.1214/09-AOS742 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
High Dimensional Semiparametric Scale-Invariant Principal Component Analysis
We propose a new high dimensional semiparametric principal component analysis
(PCA) method, named Copula Component Analysis (COCA). The semiparametric model
assumes that, after unspecified marginally monotone transformations, the
distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA
in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust
to outliers and data contamination; (iii) It is scale-invariant and yields more
interpretable results. We prove that the COCA estimators obtain fast estimation
rates and are feature selection consistent when the dimension is nearly
exponentially large relative to the sample size. Careful experiments confirm
that COCA outperforms sparse PCA on both synthetic and real-world datasets.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPMAI
A generalized Fellner-Schall method for smoothing parameter estimation with application to Tweedie location, scale and shape models
We consider the estimation of smoothing parameters and variance components in
models with a regular log likelihood subject to quadratic penalization of the
model coefficients, via a generalization of the method of Fellner (1986) and
Schall (1991). In particular: (i) we generalize the original method to the case
of penalties that are linear in several smoothing parameters, thereby covering
the important cases of tensor product and adaptive smoothers; (ii) we show why
the method's steps increase the restricted marginal likelihood of the model,
that it tends to converge faster than the EM algorithm, or obvious
accelerations of this, and investigate its relation to Newton optimization;
(iii) we generalize the method to any Fisher regular likelihood. The method
represents a considerable simplification over existing methods of estimating
smoothing parameters in the context of regular likelihoods, without sacrificing
generality: for example, it is only necessary to compute with the same first
and second derivatives of the log-likelihood required for coefficient
estimation, and not with the third or fourth order derivatives required by
alternative approaches. Examples are provided which would have been impossible
or impractical with pre-existing Fellner-Schall methods, along with an example
of a Tweedie location, scale and shape model which would be a challenge for
alternative methods
- …