31 research outputs found

    The conditional permutation test for independence while controlling for confounders

    Get PDF
    We propose a general new method, the conditional permutation test, for testing the conditional independence of variables XX and YY given a potentially high-dimensional random vector ZZ that may contain confounding factors. The proposed test permutes entries of XX non-uniformly, so as to respect the existing dependence between XX and ZZ and thus account for the presence of these confounders. Like the conditional randomization test of Cand\`es et al. (2018), our test relies on the availability of an approximation to the distribution of X∣ZX \mid Z. While Cand\`es et al. (2018)'s test uses this estimate to draw new XX values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the XX values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of X∣ZX\mid Z, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.Comment: 31 pages, 4 figure

    A Double-Robust Test For High-Dimensional Gene Coexpression Networks Conditioning On Clinical information

    Get PDF
    It has been increasingly appealing to evaluate whether expression levels of two genes in a gene coexpression network are still dependent given samples\u27 clinical information, in which the conditional independence test plays an essential role. For enhanced robustness regarding model assumptions, we propose a class of double-robust tests for evaluating the dependence of bivariate outcomes after controlling for known clinical information. Although the proposed test relies on the marginal density functions of bivariate outcomes given clinical information, the test remains valid as long as one of the density functions is correctly specified. Because of the closed-form variance formula, the proposed test procedure enjoys computational efficiency without requiring a resampling procedure or tuning parameters. We acknowledge the need to infer the conditional independence network with high-dimensional gene expressions, and further develop a procedure for multiple testing by controlling the false discovery rate. Numerical results show that our method accurately controls both the type-I error and false discovery rate, and it provides certain levels of robustness regarding model misspecification. We apply the method to a gastric cancer study with gene expression data to understand the associations between genes belonging to the transforming growth factor β signaling pathway given cancer-stage information

    Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

    Full text link
    Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. First, neural networks are used to map features to low-dimensional representations. In the second step, CI testing is performed by applying the kk-NN conditional mutual information estimator to the learned feature maps. The mappings are designed to ensure that mapped samples both preserve information and share similar information about the target variable if and only if they are close in Euclidean distance. We show that these properties boost the performance of the kk-NN estimator in the second step. The performance of the proposed method is evaluated on both synthetic and real data.Comment: Accepted to UAI 202
    corecore