1 research outputs found
Explaining Predictions by Approximating the Local Decision Boundary
Constructing accurate model-agnostic explanations for opaque machine learning
models remains a challenging task. Classification models for high-dimensional
data, like images, are often complex and highly parameterized. To reduce this
complexity, various authors attempt to explain individual predictions locally,
either in terms of a simpler local surrogate model or by communicating how the
predictions contrast with those of another class. However, existing approaches
still fall short in the following ways: a) they measure locality using a
(Euclidean) metric that is not meaningful for non-linear high-dimensional data;
or b) they do not attempt to explain the decision boundary, which is the most
relevant characteristic of classifiers that are optimized for classification
accuracy; or c) they do not give the user any freedom in specifying attributes
that are meaningful to them. We address these issues in a new procedure for
local decision boundary approximation (DBA). To construct a meaningful metric,
we train a variational autoencoder to learn a Euclidean latent space of encoded
data representations. We impose interpretability by exploiting attribute
annotations to map the latent space to attributes that are meaningful to the
user. A difficulty in evaluating explainability approaches is the lack of a
ground truth. We address this by introducing a new benchmark data set with
artificially generated Iris images, and showing that we can recover the latent
attributes that locally determine the class. We further evaluate our approach
on the CelebA image data set