2,232 research outputs found
Causal Confusion in Imitation Learning
Behavioral cloning reduces policy learning to supervised learning by training
a discriminative model to predict expert actions given observations. Such
discriminative models are non-causal: the training procedure is unaware of the
causal structure of the interaction between the expert and the environment. We
point out that ignoring causality is particularly damaging because of the
distributional shift in imitation learning. In particular, it leads to a
counter-intuitive "causal misidentification" phenomenon: access to more
information can yield worse performance. We investigate how this problem
arises, and propose a solution to combat it through targeted
interventions---either environment interaction or expert queries---to determine
the correct causal model. We show that causal misidentification occurs in
several benchmark control domains as well as realistic driving settings, and
validate our solution against DAgger and other baselines and ablations.Comment: Published at NeurIPS 2019 9 pages, plus references and appendice
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams
Existing methods for explainable artificial intelligence (XAI), including
popular feature importance measures such as SAGE, are mostly restricted to the
batch learning scenario. However, machine learning is often applied in dynamic
environments, where data arrives continuously and learning must be done in an
online manner. Therefore, we propose iSAGE, a time- and memory-efficient
incrementalization of SAGE, which is able to react to changes in the model as
well as to drift in the data-generating process. We further provide efficient
feature removal methods that break (interventional) and retain (observational)
feature dependencies. Moreover, we formally analyze our explanation method to
show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we
evaluate our approach in a thorough experimental analysis based on
well-established data sets and data streams with concept drift
Sparse Linear Identifiable Multivariate Modeling
In this paper we consider sparse and identifiable linear latent variable
(factor) and linear Bayesian network models for parsimonious analysis of
multivariate data. We propose a computationally efficient method for joint
parameter and model inference, and model comparison. It consists of a fully
Bayesian hierarchy for sparse models using slab and spike priors (two-component
delta-function and continuous mixtures), non-Gaussian latent factors and a
stochastic search over the ordering of the variables. The framework, which we
call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and
bench-marked on artificial and real biological data sets. SLIM is closest in
spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in
inference, Bayesian network structure learning and model comparison.
Experimentally, SLIM performs equally well or better than LiNGAM with
comparable computational complexity. We attribute this mainly to the stochastic
search strategy used, and to parsimony (sparsity and identifiability), which is
an explicit part of the model. We propose two extensions to the basic i.i.d.
linear framework: non-linear dependence on observed variables, called SNIM
(Sparse Non-linear Identifiable Multivariate modeling) and allowing for
correlations between latent variables, called CSLIM (Correlated SLIM), for the
temporal and/or spatial data. The source code and scripts are available from
http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure
Unifying Gaussian LWF and AMP Chain Graphs to Model Interference
An intervention may have an effect on units other than those to which it was
administered. This phenomenon is called interference and it usually goes
unmodeled. In this paper, we propose to combine Lauritzen-Wermuth-Frydenberg
and Andersson-Madigan-Perlman chain graphs to create a new class of causal
models that can represent both interference and non-interference relationships
for Gaussian distributions. Specifically, we define the new class of models,
introduce global and local and pairwise Markov properties for them, and prove
their equivalence. We also propose an algorithm for maximum likelihood
parameter estimation for the new models, and report experimental results.
Finally, we show how to compute the effects of interventions in the new models.Comment: v2: Section 6 has been added. v3: Sections 7 and 8 have been added.
v4: Major reorganization. v5: Major reorganization. v6-v7: Minor changes. v8:
Addition of Appendix B. v9: Section 7 has been rewritte
Partition MCMC for inference on acyclic digraphs
Acyclic digraphs are the underlying representation of Bayesian networks, a
widely used class of probabilistic graphical models. Learning the underlying
graph from data is a way of gaining insights about the structural properties of
a domain. Structure learning forms one of the inference challenges of
statistical graphical models.
MCMC methods, notably structure MCMC, to sample graphs from the posterior
distribution given the data are probably the only viable option for Bayesian
model averaging. Score modularity and restrictions on the number of parents of
each node allow the graphs to be grouped into larger collections, which can be
scored as a whole to improve the chain's convergence. Current examples of
algorithms taking advantage of grouping are the biased order MCMC, which acts
on the alternative space of permuted triangular matrices, and non ergodic edge
reversal moves.
Here we propose a novel algorithm, which employs the underlying combinatorial
structure of DAGs to define a new grouping. As a result convergence is improved
compared to structure MCMC, while still retaining the property of producing an
unbiased sample. Finally the method can be combined with edge reversal moves to
improve the sampler further.Comment: Revised version. 34 pages, 16 figures. R code available at
https://github.com/annlia/partitionMCM
Identifying the consequences of dynamic treatment strategies: A decision-theoretic overview
We consider the problem of learning about and comparing the consequences of
dynamic treatment strategies on the basis of observational data. We formulate
this within a probabilistic decision-theoretic framework. Our approach is
compared with related work by Robins and others: in particular, we show how
Robins's 'G-computation' algorithm arises naturally from this
decision-theoretic perspective. Careful attention is paid to the mathematical
and substantive conditions required to justify the use of this formula. These
conditions revolve around a property we term stability, which relates the
probabilistic behaviours of observational and interventional regimes. We show
how an assumption of 'sequential randomization' (or 'no unmeasured
confounders'), or an alternative assumption of 'sequential irrelevance', can be
used to infer stability. Probabilistic influence diagrams are used to simplify
manipulations, and their power and limitations are discussed. We compare our
approach with alternative formulations based on causal DAGs or potential
response models. We aim to show that formulating the problem of assessing
dynamic treatment strategies as a problem of decision analysis brings clarity,
simplicity and generality.Comment: 49 pages, 15 figure
Diffusion Causal Models for Counterfactual Estimation
We consider the task of counterfactual estimation from observational imaging
data given a known causal structure. In particular, quantifying the causal
effect of interventions for high-dimensional data with neural networks remains
an open challenge. Herein we propose Diff-SCM, a deep structural causal model
that builds on recent advances of generative energy-based models. In our
setting, inference is performed by iteratively sampling gradients of the
marginal and conditional distributions entailed by the causal model.
Counterfactual estimation is achieved by firstly inferring latent variables
with deterministic forward diffusion, then intervening on a reverse diffusion
process using the gradients of an anti-causal predictor w.r.t the input.
Furthermore, we propose a metric for evaluating the generated counterfactuals.
We find that Diff-SCM produces more realistic and minimal counterfactuals than
baselines on MNIST data and can also be applied to ImageNet data. Code is
available https://github.com/vios-s/Diff-SCM.Comment: Accepted at CLeaR (Causal Learning and Reasoning) 202
- …