Model-X approaches to testing conditional independence between a predictor
and an outcome variable given a vector of covariates usually assume exact
knowledge of the conditional distribution of the predictor given the
covariates. Nevertheless, model-X methodologies are often deployed with this
conditional distribution learned in sample. We investigate the consequences of
this choice through the lens of the distilled conditional randomization test
(dCRT). We find that Type-I error control is still possible, but only if the
mean of the outcome variable given the covariates is estimated well enough.
This demonstrates that the dCRT is doubly robust, and motivates a comparison to
the generalized covariance measure (GCM) test, another doubly robust
conditional independence test. We prove that these two tests are asymptotically
equivalent, and show that the GCM test is in fact optimal against (generalized)
partially linear alternatives by leveraging semiparametric efficiency theory.
In an extensive simulation study, we compare the dCRT to the GCM test. We find
that the GCM test and the dCRT are quite similar in terms of both Type-I error
and power, and that post-lasso based test statistics (as compared to lasso
based statistics) can dramatically improve Type-I error control for both
methods