975 research outputs found
Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations
Post-hoc explanations of machine learning models are crucial for people to
understand and act on algorithmic predictions. An intriguing class of
explanations is through counterfactuals, hypothetical examples that show people
how to obtain a different prediction. We posit that effective counterfactual
explanations should satisfy two properties: feasibility of the counterfactual
actions given user context and constraints, and diversity among the
counterfactuals presented. To this end, we propose a framework for generating
and evaluating a diverse set of counterfactual explanations based on
determinantal point processes. To evaluate the actionability of
counterfactuals, we provide metrics that enable comparison of
counterfactual-based methods to other local explanation methods. We further
address necessary tradeoffs and point to causal implications in optimizing for
counterfactuals. Our experiments on four real-world datasets show that our
framework can generate a set of counterfactuals that are diverse and well
approximate local decision boundaries, outperforming prior approaches to
generating diverse counterfactuals. We provide an implementation of the
framework at https://github.com/microsoft/DiCE.Comment: 13 page
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR
There has been much discussion of the right to explanation in the EU General
Data Protection Regulation, and its existence, merits, and disadvantages.
Implementing a right to explanation that opens the black box of algorithmic
decision-making faces major legal and technical barriers. Explaining the
functionality of complex algorithmic decision-making systems and their
rationale in specific cases is a technically challenging problem. Some
explanations may offer little meaningful information to data subjects, raising
questions around their value. Explanations of automated decisions need not
hinge on the general public understanding how algorithmic systems function.
Even though such interpretability is of great importance and should be pursued,
explanations can, in principle, be offered without opening the black box.
Looking at explanations as a means to help a data subject act rather than
merely understand, one could gauge the scope and content of explanations
according to the specific goal or action they are intended to support. From the
perspective of individuals affected by automated decision-making, we propose
three aims for explanations: (1) to inform and help the individual understand
why a particular decision was reached, (2) to provide grounds to contest the
decision if the outcome is undesired, and (3) to understand what would need to
change in order to receive a desired result in the future, based on the current
decision-making model. We assess how each of these goals finds support in the
GDPR. We suggest data controllers should offer a particular type of
explanation, unconditional counterfactual explanations, to support these three
aims. These counterfactual explanations describe the smallest change to the
world that can be made to obtain a desirable outcome, or to arrive at the
closest possible world, without needing to explain the internal logic of the
system
LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees
Systems based on artificial intelligence and machine learning models should
be transparent, in the sense of being capable of explaining their decisions to
gain humans' approval and trust. While there are a number of explainability
techniques that can be used to this end, many of them are only capable of
outputting a single one-size-fits-all explanation that simply cannot address
all of the explainees' diverse needs. In this work we introduce a
model-agnostic and post-hoc local explainability technique for black-box
predictions called LIMEtree, which employs surrogate multi-output regression
trees. We validate our algorithm on a deep neural network trained for object
detection in images and compare it against Local Interpretable Model-agnostic
Explanations (LIME). Our method comes with local fidelity guarantees and can
produce a range of diverse explanation types, including contrastive and
counterfactual explanations praised in the literature. Some of these
explanations can be interactively personalised to create bespoke, meaningful
and actionable insights into the model's behaviour. While other methods may
give an illusion of customisability by wrapping, otherwise static, explanations
in an interactive interface, our explanations are truly interactive, in the
sense of allowing the user to "interrogate" a black-box model. LIMEtree can
therefore produce consistent explanations on which an interactive exploratory
process can be built
Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
Neural networks are among the most accurate supervised learning methods in
use today, but their opacity makes them difficult to trust in critical
applications, especially when conditions in training differ from those in test.
Recent work on explanations for black-box models has produced tools (e.g. LIME)
to show the implicit rules behind predictions, which can help us identify when
models are right for the wrong reasons. However, these methods do not scale to
explaining entire datasets and cannot correct the problems they reveal. We
introduce a method for efficiently explaining and regularizing differentiable
models by examining and selectively penalizing their input gradients, which
provide a normal to the decision boundary. We apply these penalties both based
on expert annotation and in an unsupervised fashion that encourages diverse
models with qualitatively different decision boundaries for the same
classification problem. On multiple datasets, we show our approach generates
faithful explanations and models that generalize much better when conditions
differ between training and test
Explainability for Machine Learning Models: From Data Adaptability to User Perception
This thesis explores the generation of local explanations for already
deployed machine learning models, aiming to identify optimal conditions for
producing meaningful explanations considering both data and user requirements.
The primary goal is to develop methods for generating explanations for any
model while ensuring that these explanations remain faithful to the underlying
model and comprehensible to the users.
The thesis is divided into two parts. The first enhances a widely used
rule-based explanation method. It then introduces a novel approach for
evaluating the suitability of linear explanations to approximate a model.
Additionally, it conducts a comparative experiment between two families of
counterfactual explanation methods to analyze the advantages of one over the
other. The second part focuses on user experiments to assess the impact of
three explanation methods and two distinct representations. These experiments
measure how users perceive their interaction with the model in terms of
understanding and trust, depending on the explanations and representations.
This research contributes to a better explanation generation, with potential
implications for enhancing the transparency, trustworthiness, and usability of
deployed AI systems.Comment: PhD Thesi
Instance-based Counterfactual Explanations for Time Series Classification
In recent years, there has been a rapidly expanding focus on explaining the
predictions made by black-box AI systems that handle image and tabular data.
However, considerably less attention has been paid to explaining the
predictions of opaque AI systems handling time series data. In this paper, we
advance a novel model-agnostic, case-based technique -- Native Guide -- that
generates counterfactual explanations for time series classifiers. Given a
query time series, , for which a black-box classification system
predicts class, , a counterfactual time series explanation shows how
could change, such that the system predicts an alternative class, . The
proposed instance-based technique adapts existing counterfactual instances in
the case-base by highlighting and modifying discriminative areas of the time
series that underlie the classification. Quantitative and qualitative results
from two comparative experiments indicate that Native Guide generates
plausible, proximal, sparse and diverse explanations that are better than those
produced by key benchmark counterfactual methods
The Pragmatic Turn in Explainable Artificial Intelligence (XAI)
In this paper I argue that the search for explainable models and interpretable decisions in AI must be reformulated in terms of the broader project of offering a pragmatic and naturalistic account of understanding in AI. Intuitively, the purpose of providing an explanation of a model or a decision is to make it understandable to its stakeholders. But without a previous grasp of what it means to say that an agent understands a model or a decision, the explanatory strategies will lack a well-defined goal. Aside from providing a clearer objective for XAI, focusing on understanding also allows us to relax the factivity condition on explanation, which is impossible to fulfill in many machine learning models, and to focus instead on the pragmatic conditions that determine the best fit between a model and the methods and devices deployed to understand it. After an examination of the different types of understanding discussed in the philosophical and psychological literature, I conclude that interpretative or approximation models not only provide the best way to achieve the objectual understanding of a machine learning model, but are also a necessary condition to achieve post hoc interpretability. This conclusion is partly based on the shortcomings of the purely functionalist approach to post hoc interpretability that seems to be predominant in most recent literature
Feature-based Learning for Diverse and Privacy-Preserving Counterfactual Explanations
Interpretable machine learning seeks to understand the reasoning process of
complex black-box systems that are long notorious for lack of explainability.
One flourishing approach is through counterfactual explanations, which provide
suggestions on what a user can do to alter an outcome. Not only must a
counterfactual example counter the original prediction from the black-box
classifier but it should also satisfy various constraints for practical
applications. Diversity is one of the critical constraints that however remains
less discussed. While diverse counterfactuals are ideal, it is computationally
challenging to simultaneously address some other constraints. Furthermore,
there is a growing privacy concern over the released counterfactual data. To
this end, we propose a feature-based learning framework that effectively
handles the counterfactual constraints and contributes itself to the limited
pool of private explanation models. We demonstrate the flexibility and
effectiveness of our method in generating diverse counterfactuals of
actionability and plausibility. Our counterfactual engine is more efficient
than counterparts of the same capacity while yielding the lowest
re-identification risks
- …