46,084 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Posterior mean and variance approximation for regression and time series problems
This paper develops a methodology for approximating the posterior first two moments of the posterior distribution in Bayesian inference. Partially specified probability models that are defined only by specifying means and variances, are constructed based upon second-order conditional independence in order to facilitate posterior updating and prediction of required distributional quantities. Such models are formulated particularly for multivariate regression and time series analysis with unknown observational variance-covariance components. The similarities and differences of these models with the Bayes linear approach are established. Several subclasses of important models, including regression and time series models with errors following multivariate t, inverted multivariate t and Wishart distributions, are discussed in detail. Two numerical examples consisting of simulated data and of US investment and change in inventory data illustrate the proposed methodology
Marginal integration for nonparametric causal inference
We consider the problem of inferring the total causal effect of a single
variable intervention on a (response) variable of interest. We propose a
certain marginal integration regression technique for a very general class of
potentially nonlinear structural equation models (SEMs) with known structure,
or at least known superset of adjustment variables: we call the procedure
S-mint regression. We easily derive that it achieves the convergence rate as
for nonparametric regression: for example, single variable intervention effects
can be estimated with convergence rate assuming smoothness with
twice differentiable functions. Our result can also be seen as a major
robustness property with respect to model misspecification which goes much
beyond the notion of double robustness. Furthermore, when the structure of the
SEM is not known, we can estimate (the equivalence class of) the directed
acyclic graph corresponding to the SEM, and then proceed by using S-mint based
on these estimates. We empirically compare the S-mint regression method with
more classical approaches and argue that the former is indeed more robust, more
reliable and substantially simpler.Comment: 40 pages, 14 figure
Sufficient Covariate, Propensity Variable and Doubly Robust Estimation
Statistical causal inference from observational studies often requires
adjustment for a possibly multi-dimensional variable, where dimension reduction
is crucial. The propensity score, first introduced by Rosenbaum and Rubin, is a
popular approach to such reduction. We address causal inference within Dawid's
decision-theoretic framework, where it is essential to pay attention to
sufficient covariates and their properties. We examine the role of a propensity
variable in a normal linear model. We investigate both population-based and
sample-based linear regressions, with adjustments for a multivariate covariate
and for a propensity variable. In addition, we study the augmented inverse
probability weighted estimator, involving a combination of a response model and
a propensity model. In a linear regression with homoscedasticity, a propensity
variable is proved to provide the same estimated causal effect as multivariate
adjustment. An estimated propensity variable may, but need not, yield better
precision than the true propensity variable. The augmented inverse probability
weighted estimator is doubly robust and can improve precision if the propensity
model is correctly specified
Inference in Additively Separable Models With a High-Dimensional Set of Conditioning Variables
This paper studies nonparametric series estimation and inference for the
effect of a single variable of interest x on an outcome y in the presence of
potentially high-dimensional conditioning variables z. The context is an
additively separable model E[y|x, z] = g0(x) + h0(z). The model is
high-dimensional in the sense that the series of approximating functions for
h0(z) can have more terms than the sample size, thereby allowing z to have
potentially very many measured characteristics. The model is required to be
approximately sparse: h0(z) can be approximated using only a small subset of
series terms whose identities are unknown. This paper proposes an estimation
and inference method for g0(x) called Post-Nonparametric Double Selection which
is a generalization of Post-Double Selection. Standard rates of convergence and
asymptotic normality for the estimator are shown to hold uniformly over a large
class of sparse data generating processes. A simulation study illustrates
finite sample estimation properties of the proposed estimator and coverage
properties of the corresponding confidence intervals. Finally, an empirical
application to college admissions policy demonstrates the practical
implementation of the proposed method
- …