Search CORE

31 research outputs found

The conditional permutation test for independence while controlling for confounders

Author: Athey
Barber
Belloni
Candès
Cover
Dawid
Doran
Ernst
Fukumizu
Gretton
Hennessy
Kojadinovic
Pfister
Rosenbaum
Runge
Sen
Song
Stigler
Strobl
Su
Su
Su
Székely
Székely
Veraverbeke
Weihs
Zhang
Publication venue
Publication date: 07/05/2019
Field of study

We propose a general new method, the conditional permutation test, for testing the conditional independence of variables

X

and

Y

given a potentially high-dimensional random vector

Z

that may contain confounding factors. The proposed test permutes entries of

X

non-uniformly, so as to respect the existing dependence between

X

and

Z

and thus account for the presence of these confounders. Like the conditional randomization test of Cand\`es et al. (2018), our test relies on the availability of an approximation to the distribution of

X \mid Z

. While Cand\`es et al. (2018)'s test uses this estimate to draw new

X

values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the

X

values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of

X\mid Z

, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.Comment: 31 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

A Double-Robust Test For High-Dimensional Gene Coexpression Networks Conditioning On Clinical information

Author: Ding Maomao
Li Ruosha
Ning Jing
Qin Jin
Publication venue: DigitalCommons@TMC
Publication date: 01/12/2023
Field of study

It has been increasingly appealing to evaluate whether expression levels of two genes in a gene coexpression network are still dependent given samples\u27 clinical information, in which the conditional independence test plays an essential role. For enhanced robustness regarding model assumptions, we propose a class of double-robust tests for evaluating the dependence of bivariate outcomes after controlling for known clinical information. Although the proposed test relies on the marginal density functions of bivariate outcomes given clinical information, the test remains valid as long as one of the density functions is correctly specified. Because of the closed-form variance formula, the proposed test procedure enjoys computational efficiency without requiring a resampling procedure or tuning parameters. We acknowledge the need to infer the conditional independence network with high-dimensional gene expressions, and further develop a procedure for multiple testing by controlling the false discovery rate. Numerical results show that our method accurately controls both the type-I error and false discovery rate, and it provides certain levels of robustness regarding model misspecification. We apply the method to a gastric cancer study with gene expression data to understand the associations between genes belonging to the transforming growth factor β signaling pathway given cancer-stage information

DigitalCommons@The Texas Medical Center

Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

Author: Ghassami AmirEmad
Kiyavash Negar
Raginsky Maxim
Rosenbaum Elyse
Yang Alan
Publication venue
Publication date: 19/06/2020
Field of study

Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. First, neural networks are used to map features to low-dimensional representations. In the second step, CI testing is performed by applying the

k

-NN conditional mutual information estimator to the learned feature maps. The mappings are designed to ensure that mapped samples both preserve information and share similar information about the target variable if and only if they are close in Euclidean distance. We show that these properties boost the performance of the

k

-NN estimator in the second step. The performance of the proposed method is evaluated on both synthetic and real data.Comment: Accepted to UAI 202

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne