46,084 research outputs found

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Posterior mean and variance approximation for regression and time series problems

    Get PDF
    This paper develops a methodology for approximating the posterior first two moments of the posterior distribution in Bayesian inference. Partially specified probability models that are defined only by specifying means and variances, are constructed based upon second-order conditional independence in order to facilitate posterior updating and prediction of required distributional quantities. Such models are formulated particularly for multivariate regression and time series analysis with unknown observational variance-covariance components. The similarities and differences of these models with the Bayes linear approach are established. Several subclasses of important models, including regression and time series models with errors following multivariate t, inverted multivariate t and Wishart distributions, are discussed in detail. Two numerical examples consisting of simulated data and of US investment and change in inventory data illustrate the proposed methodology

    Marginal integration for nonparametric causal inference

    Full text link
    We consider the problem of inferring the total causal effect of a single variable intervention on a (response) variable of interest. We propose a certain marginal integration regression technique for a very general class of potentially nonlinear structural equation models (SEMs) with known structure, or at least known superset of adjustment variables: we call the procedure S-mint regression. We easily derive that it achieves the convergence rate as for nonparametric regression: for example, single variable intervention effects can be estimated with convergence rate n−2/5n^{-2/5} assuming smoothness with twice differentiable functions. Our result can also be seen as a major robustness property with respect to model misspecification which goes much beyond the notion of double robustness. Furthermore, when the structure of the SEM is not known, we can estimate (the equivalence class of) the directed acyclic graph corresponding to the SEM, and then proceed by using S-mint based on these estimates. We empirically compare the S-mint regression method with more classical approaches and argue that the former is indeed more robust, more reliable and substantially simpler.Comment: 40 pages, 14 figure

    Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

    Full text link
    Statistical causal inference from observational studies often requires adjustment for a possibly multi-dimensional variable, where dimension reduction is crucial. The propensity score, first introduced by Rosenbaum and Rubin, is a popular approach to such reduction. We address causal inference within Dawid's decision-theoretic framework, where it is essential to pay attention to sufficient covariates and their properties. We examine the role of a propensity variable in a normal linear model. We investigate both population-based and sample-based linear regressions, with adjustments for a multivariate covariate and for a propensity variable. In addition, we study the augmented inverse probability weighted estimator, involving a combination of a response model and a propensity model. In a linear regression with homoscedasticity, a propensity variable is proved to provide the same estimated causal effect as multivariate adjustment. An estimated propensity variable may, but need not, yield better precision than the true propensity variable. The augmented inverse probability weighted estimator is doubly robust and can improve precision if the propensity model is correctly specified

    Inference in Additively Separable Models With a High-Dimensional Set of Conditioning Variables

    Full text link
    This paper studies nonparametric series estimation and inference for the effect of a single variable of interest x on an outcome y in the presence of potentially high-dimensional conditioning variables z. The context is an additively separable model E[y|x, z] = g0(x) + h0(z). The model is high-dimensional in the sense that the series of approximating functions for h0(z) can have more terms than the sample size, thereby allowing z to have potentially very many measured characteristics. The model is required to be approximately sparse: h0(z) can be approximated using only a small subset of series terms whose identities are unknown. This paper proposes an estimation and inference method for g0(x) called Post-Nonparametric Double Selection which is a generalization of Post-Double Selection. Standard rates of convergence and asymptotic normality for the estimator are shown to hold uniformly over a large class of sparse data generating processes. A simulation study illustrates finite sample estimation properties of the proposed estimator and coverage properties of the corresponding confidence intervals. Finally, an empirical application to college admissions policy demonstrates the practical implementation of the proposed method
    • …
    corecore