Shapley values underlie one of the most popular model-agnostic methods within
explainable artificial intelligence. These values are designed to attribute the
difference between a model's prediction and an average baseline to the
different features used as input to the model. Being based on solid
game-theoretic principles, Shapley values uniquely satisfy several desirable
properties, which is why they are increasingly used to explain the predictions
of possibly complex and highly non-linear machine learning models. Shapley
values are well calibrated to a user's intuition when features are independent,
but may lead to undesirable, counterintuitive explanations when the
independence assumption is violated.
In this paper, we propose a novel framework for computing Shapley values that
generalizes recent work that aims to circumvent the independence assumption. By
employing Pearl's do-calculus, we show how these 'causal' Shapley values can be
derived for general causal graphs without sacrificing any of their desirable
properties. Moreover, causal Shapley values enable us to separate the
contribution of direct and indirect effects. We provide a practical
implementation for computing causal Shapley values based on causal chain graphs
when only partial information is available and illustrate their utility on a
real-world example.Comment: Accepted at 34th Conference on Neural Information Processing Systems
(NeurIPS 2020