Search CORE

55,519 research outputs found

Feature-to-feature regression for a two-step conditional independence test

Author: Filippi SL
Flaxman S
Sejdinovic D
Zhang Q
Publication venue
Publication date: 01/01/2017
Field of study

The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Concepts and a case study for a flexible class of graphical Markov models

Author: Cox D. R.
Wermuth Nanny
Publication venue
Publication date: 01/01/2013
Field of study

With graphical Markov models, one can investigate complex dependences, summarize some results of statistical analyses with graphs and use these graphs to understand implications of well-fitting models. The models have a rich history and form an area that has been intensively studied and developed in recent years. We give a brief review of the main concepts and describe in more detail a flexible subclass of models, called traceable regressions. These are sequences of joint response regressions for which regression graphs permit one to trace and thereby understand pathways of dependence. We use these methods to reanalyze and interpret data from a prospective study of child development, now known as the Mannheim Study of Children at Risk. The two related primary features concern cognitive and motor development, at the age of 4.5 and 8 years of a child. Deficits in these features form a sequence of joint responses. Several possible risks are assessed at birth of the child and when the child reached age 3 months and 2 years.Comment: 21 pages, 7 figures, 7 tables; invited, refereed chapter in a boo

arXiv.org e-Print Archive

Chalmers Research

Chalmers Publication Library

The conditional permutation test for independence while controlling for confounders

Author: Athey
Barber
Belloni
Candès
Cover
Dawid
Doran
Ernst
Fukumizu
Gretton
Hennessy
Kojadinovic
Pfister
Rosenbaum
Runge
Sen
Song
Stigler
Strobl
Su
Su
Su
Székely
Székely
Veraverbeke
Weihs
Zhang
Publication venue
Publication date: 07/05/2019
Field of study

We propose a general new method, the conditional permutation test, for testing the conditional independence of variables

X

and

Y

given a potentially high-dimensional random vector

Z

that may contain confounding factors. The proposed test permutes entries of

X

non-uniformly, so as to respect the existing dependence between

X

and

Z

and thus account for the presence of these confounders. Like the conditional randomization test of Cand\`es et al. (2018), our test relies on the availability of an approximation to the distribution of

X \mid Z

. While Cand\`es et al. (2018)'s test uses this estimate to draw new

X

values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the

X

values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of

X\mid Z

, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.Comment: 31 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository