33,956 research outputs found
On a Nonparametric Notion of Residual and its Applications
Let be a continuous random vector in , . In this paper, we define the notion of a
nonparametric residual of on that is always independent of the
predictor . We study its properties and show that the proposed
notion of residual matches with the usual residual (error) in a multivariate
normal regression model. Given a random vector in
, we use this notion of
residual to show that the conditional independence between and , given
, is equivalent to the mutual independence of the residuals (of
on and on ) and . This result is used
to develop a test for conditional independence. We propose a bootstrap scheme
to approximate the critical value of this test. We compare the proposed test,
which is easily implementable, with some of the existing procedures through a
simulation study.Comment: 19 pages, 2 figure
Invariant Causal Prediction for Nonlinear Models
An important problem in many domains is to predict how a system will respond
to interventions. This task is inherently linked to estimating the system's
underlying causal structure. To this end, Invariant Causal Prediction (ICP)
(Peters et al., 2016) has been proposed which learns a causal model exploiting
the invariance of causal relations using data from different environments. When
considering linear models, the implementation of ICP is relatively
straightforward. However, the nonlinear case is more challenging due to the
difficulty of performing nonparametric tests for conditional independence. In
this work, we present and evaluate an array of methods for nonlinear and
nonparametric versions of ICP for learning the causal parents of given target
variables. We find that an approach which first fits a nonlinear model with
data pooled over all environments and then tests for differences between the
residual distributions across environments is quite robust across a large
variety of simulation settings. We call this procedure "invariant residual
distribution test". In general, we observe that the performance of all
approaches is critically dependent on the true (unknown) causal structure and
it becomes challenging to achieve high power if the parental set includes more
than two variables. As a real-world example, we consider fertility rate
modelling which is central to world population projections. We explore
predicting the effect of hypothetical interventions using the accepted models
from nonlinear ICP. The results reaffirm the previously observed central causal
role of child mortality rates
Specification testing
Publicad
Joint asymptotics for semi-nonparametric regression models with partially linear structure
We consider a joint asymptotic framework for studying semi-nonparametric
regression models where (finite-dimensional) Euclidean parameters and
(infinite-dimensional) functional parameters are both of interest. The class of
models in consideration share a partially linear structure and are estimated in
two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first
show that the Euclidean estimator and (pointwise) functional estimator, which
are re-scaled at different rates, jointly converge to a zero-mean Gaussian
vector. This weak convergence result reveals a surprising joint asymptotics
phenomenon: these two estimators are asymptotically independent. A major goal
of this paper is to gain first-hand insights into the above phenomenon.
Moreover, a likelihood ratio testing is proposed for a set of joint local
hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9
(1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical
tool, called a joint Bahadur representation, is developed for studying these
joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the power of conditional independence testing under model-X
For testing conditional independence (CI) of a response Y and a predictor X
given covariates Z, the recently introduced model-X (MX) framework has been the
subject of active methodological research, especially in the context of MX
knockoffs and their successful application to genome-wide association studies.
In this paper, we study the power of MX CI tests, yielding quantitative
explanations for empirically observed phenomena and novel insights to guide the
design of MX methodology. We show that any valid MX CI test must also be valid
conditionally on Y and Z; this conditioning allows us to reformulate the
problem as testing a point null hypothesis involving the conditional
distribution of X. The Neyman-Pearson lemma then implies that the conditional
randomization test (CRT) based on a likelihood statistic is the most powerful
MX CI test against a point alternative. We also obtain a related optimality
result for MX knockoffs. Switching to an asymptotic framework with arbitrarily
growing covariate dimension, we derive an expression for the limiting power of
the CRT against local semiparametric alternatives in terms of the prediction
error of the machine learning algorithm on which its test statistic is based.
Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error
control under the assumption that only the first two moments of X given Z are
known, a significant relaxation of the MX assumption
Developments in the Analysis of Spatial Data
Disregarding spatial dependence can invalidate methods for analyzingcross-sectional and panel data. We discuss ongoing work on developingmethods that allow for, test for, or estimate, spatial dependence. Muchof the stress is on nonparametric and semiparametric methods.
- …