33,956 research outputs found

    On a Nonparametric Notion of Residual and its Applications

    Get PDF
    Let (X,Z)(X, \mathbf{Z}) be a continuous random vector in R×Rd\mathbb{R} \times \mathbb{R}^d, d≥1d \ge 1. In this paper, we define the notion of a nonparametric residual of XX on Z\mathbf{Z} that is always independent of the predictor Z\mathbf{Z}. We study its properties and show that the proposed notion of residual matches with the usual residual (error) in a multivariate normal regression model. Given a random vector (X,Y,Z)(X, Y, \mathbf{Z}) in R×R×Rd\mathbb{R} \times \mathbb{R} \times \mathbb{R}^d, we use this notion of residual to show that the conditional independence between XX and YY, given Z\mathbf{Z}, is equivalent to the mutual independence of the residuals (of XX on Z\mathbf{Z} and YY on Z\mathbf{Z}) and Z\mathbf{Z}. This result is used to develop a test for conditional independence. We propose a bootstrap scheme to approximate the critical value of this test. We compare the proposed test, which is easily implementable, with some of the existing procedures through a simulation study.Comment: 19 pages, 2 figure

    Invariant Causal Prediction for Nonlinear Models

    Full text link
    An important problem in many domains is to predict how a system will respond to interventions. This task is inherently linked to estimating the system's underlying causal structure. To this end, Invariant Causal Prediction (ICP) (Peters et al., 2016) has been proposed which learns a causal model exploiting the invariance of causal relations using data from different environments. When considering linear models, the implementation of ICP is relatively straightforward. However, the nonlinear case is more challenging due to the difficulty of performing nonparametric tests for conditional independence. In this work, we present and evaluate an array of methods for nonlinear and nonparametric versions of ICP for learning the causal parents of given target variables. We find that an approach which first fits a nonlinear model with data pooled over all environments and then tests for differences between the residual distributions across environments is quite robust across a large variety of simulation settings. We call this procedure "invariant residual distribution test". In general, we observe that the performance of all approaches is critically dependent on the true (unknown) causal structure and it becomes challenging to achieve high power if the parental set includes more than two variables. As a real-world example, we consider fertility rate modelling which is central to world population projections. We explore predicting the effect of hypothetical interventions using the accepted models from nonlinear ICP. The results reaffirm the previously observed central causal role of child mortality rates

    Joint asymptotics for semi-nonparametric regression models with partially linear structure

    Full text link
    We consider a joint asymptotic framework for studying semi-nonparametric regression models where (finite-dimensional) Euclidean parameters and (infinite-dimensional) functional parameters are both of interest. The class of models in consideration share a partially linear structure and are estimated in two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first show that the Euclidean estimator and (pointwise) functional estimator, which are re-scaled at different rates, jointly converge to a zero-mean Gaussian vector. This weak convergence result reveals a surprising joint asymptotics phenomenon: these two estimators are asymptotically independent. A major goal of this paper is to gain first-hand insights into the above phenomenon. Moreover, a likelihood ratio testing is proposed for a set of joint local hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9 (1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical tool, called a joint Bahadur representation, is developed for studying these joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the power of conditional independence testing under model-X

    Full text link
    For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative explanations for empirically observed phenomena and novel insights to guide the design of MX methodology. We show that any valid MX CI test must also be valid conditionally on Y and Z; this conditioning allows us to reformulate the problem as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that the conditional randomization test (CRT) based on a likelihood statistic is the most powerful MX CI test against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption

    Developments in the Analysis of Spatial Data

    Get PDF
    Disregarding spatial dependence can invalidate methods for analyzingcross-sectional and panel data. We discuss ongoing work on developingmethods that allow for, test for, or estimate, spatial dependence. Muchof the stress is on nonparametric and semiparametric methods.
    • …
    corecore