31 research outputs found
The conditional permutation test for independence while controlling for confounders
We propose a general new method, the conditional permutation test, for
testing the conditional independence of variables and given a
potentially high-dimensional random vector that may contain confounding
factors. The proposed test permutes entries of non-uniformly, so as to
respect the existing dependence between and and thus account for the
presence of these confounders. Like the conditional randomization test of
Cand\`es et al. (2018), our test relies on the availability of an approximation
to the distribution of . While Cand\`es et al. (2018)'s test uses
this estimate to draw new values, for our test we use this approximation to
design an appropriate non-uniform distribution on permutations of the
values already seen in the true data. We provide an efficient Markov Chain
Monte Carlo sampler for the implementation of our method, and establish bounds
on the Type I error in terms of the error in the approximation of the
conditional distribution of , finding that, for the worst case test
statistic, the inflation in Type I error of the conditional permutation test is
no larger than that of the conditional randomization test. We validate these
theoretical results with experiments on simulated data and on the Capital
Bikeshare data set.Comment: 31 pages, 4 figure
A Double-Robust Test For High-Dimensional Gene Coexpression Networks Conditioning On Clinical information
It has been increasingly appealing to evaluate whether expression levels of two genes in a gene coexpression network are still dependent given samples\u27 clinical information, in which the conditional independence test plays an essential role. For enhanced robustness regarding model assumptions, we propose a class of double-robust tests for evaluating the dependence of bivariate outcomes after controlling for known clinical information. Although the proposed test relies on the marginal density functions of bivariate outcomes given clinical information, the test remains valid as long as one of the density functions is correctly specified. Because of the closed-form variance formula, the proposed test procedure enjoys computational efficiency without requiring a resampling procedure or tuning parameters. We acknowledge the need to infer the conditional independence network with high-dimensional gene expressions, and further develop a procedure for multiple testing by controlling the false discovery rate. Numerical results show that our method accurately controls both the type-I error and false discovery rate, and it provides certain levels of robustness regarding model misspecification. We apply the method to a gastric cancer study with gene expression data to understand the associations between genes belonging to the transforming growth factor β signaling pathway given cancer-stage information
Model-Augmented Estimation of Conditional Mutual Information for Feature Selection
Markov blanket feature selection, while theoretically optimal, is generally
challenging to implement. This is due to the shortcomings of existing
approaches to conditional independence (CI) testing, which tend to struggle
either with the curse of dimensionality or computational complexity. We propose
a novel two-step approach which facilitates Markov blanket feature selection in
high dimensions. First, neural networks are used to map features to
low-dimensional representations. In the second step, CI testing is performed by
applying the -NN conditional mutual information estimator to the learned
feature maps. The mappings are designed to ensure that mapped samples both
preserve information and share similar information about the target variable if
and only if they are close in Euclidean distance. We show that these properties
boost the performance of the -NN estimator in the second step. The
performance of the proposed method is evaluated on both synthetic and real
data.Comment: Accepted to UAI 202