53,341 research outputs found
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes
Efficient Computation of Counterfactual Bounds
We assume to be given structural equations over discrete variables inducing a
directed acyclic graph, namely, a structural causal model, together with data
about its internal nodes. The question we want to answer is how we can compute
bounds for partially identifiable counterfactual queries from such an input. We
start by giving a map from structural casual models to credal networks. This
allows us to compute exact counterfactual bounds via algorithms for credal nets
on a subclass of structural causal models. Exact computation is going to be
inefficient in general given that, as we show, causal inference is NP-hard even
on polytrees. We target then approximate bounds via a causal EM scheme. We
evaluate their accuracy by providing credible intervals on the quality of the
approximation; we show through a synthetic benchmark that the EM scheme
delivers accurate results in a fair number of runs. In the course of the
discussion, we also point out what seems to be a neglected limitation to the
trending idea that counterfactual bounds can be computed without knowledge of
the structural equations. We also present a real case study on palliative care
to show how our algorithms can readily be used for practical purposes
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes
MALTS: Matching After Learning to Stretch
We introduce a flexible framework that produces high-quality almost-exact
matches for causal inference. Most prior work in matching uses ad-hoc distance
metrics, often leading to poor quality matches, particularly when there are
irrelevant covariates. In this work, we learn an interpretable distance metric
for matching, which leads to substantially higher quality matches. The learned
distance metric stretches the covariate space according to each covariate's
contribution to outcome prediction: this stretching means that mismatches on
important covariates carry a larger penalty than mismatches on irrelevant
covariates. Our ability to learn flexible distance metrics leads to matches
that are interpretable and useful for the estimation of conditional average
treatment effects.Comment: 40 pages, 5 Tables, 12 Figure
Telling Cause from Effect using MDL-based Local and Global Regression
We consider the fundamental problem of inferring the causal direction between
two univariate numeric random variables and from observational data.
The two-variable case is especially difficult to solve since it is not possible
to use standard conditional independence tests between the variables.
To tackle this problem, we follow an information theoretic approach based on
Kolmogorov complexity and use the Minimum Description Length (MDL) principle to
provide a practical solution. In particular, we propose a compression scheme to
encode local and global functional relations using MDL-based regression. We
infer causes in case it is shorter to describe as a function of
than the inverse direction. In addition, we introduce Slope, an efficient
linear-time algorithm that through thorough empirical evaluation on both
synthetic and real world data we show outperforms the state of the art by a
wide margin.Comment: 10 pages, To appear in ICDM1
Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms
Three classes of algorithms to learn the structure of Bayesian networks from
data are common in the literature: constraint-based algorithms, which use
conditional independence tests to learn the dependence structure of the data;
score-based algorithms, which use goodness-of-fit scores as objective functions
to maximise; and hybrid algorithms that combine both approaches.
Constraint-based and score-based algorithms have been shown to learn the same
structures when conditional independence and goodness of fit are both assessed
using entropy and the topological ordering of the network is known (Cowell,
2001).
In this paper, we investigate how these three classes of algorithms perform
outside the assumptions above in terms of speed and accuracy of network
reconstruction for both discrete and Gaussian Bayesian networks. We approach
this question by recognising that structure learning is defined by the
combination of a statistical criterion and an algorithm that determines how the
criterion is applied to the data. Removing the confounding effect of different
choices for the statistical criterion, we find using both simulated and
real-world complex data that constraint-based algorithms are often less
accurate than score-based algorithms, but are seldom faster (even at large
sample sizes); and that hybrid algorithms are neither faster nor more accurate
than constraint-based algorithms. This suggests that commonly held beliefs on
structure learning in the literature are strongly influenced by the choice of
particular statistical criteria rather than just by the properties of the
algorithms themselves.Comment: 27 pages, 8 figure
- …