368 research outputs found
Recommended from our members
Measurable counterfactual local explanations for any classifier
We propose a novel method for explaining the predictions of any classifier. In our approach, local explanations are expected to explain both the outcome of a prediction and how that prediction would change if athings had been different'. Furthermore, we argue that satisfactory explanations cannot be dissociated from a notion and measure of fidelity, as advocated in the early days of neural networks' knowledge extraction. We introduce a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary. A system called CLEAR: Counterfactual Local Explanations via Regression, is introduced and evaluated. CLEAR generates b-counterfactual explanations that state minimum changes necessary to flip a prediction's classification. CLEAR then builds local regression models, using the b-counterfactuals to measure and improve the fidelity of its regressions. By contrast, the popular LIME method [17], which also uses regression to generate local explanations, neither measures its own fidelity nor generates counterfactuals. CLEAR's regressions are found to have significantly higher fidelity than LIME's, averaging over 40% higher in this paper's five case studies
Actionable Recourse in Linear Classification
Machine learning models are increasingly used to automate decisions that
affect humans - deciding who should receive a loan, a job interview, or a
social service. In such applications, a person should have the ability to
change the decision of a model. When a person is denied a loan by a credit
score, for example, they should be able to alter its input variables in a way
that guarantees approval. Otherwise, they will be denied the loan as long as
the model is deployed. More importantly, they will lack the ability to
influence a decision that affects their livelihood.
In this paper, we frame these issues in terms of recourse, which we define as
the ability of a person to change the decision of a model by altering
actionable input variables (e.g., income vs. age or marital status). We present
integer programming tools to ensure recourse in linear classification problems
without interfering in model development. We demonstrate how our tools can
inform stakeholders through experiments on credit scoring problems. Our results
show that recourse can be significantly affected by standard practices in model
development, and motivate the need to evaluate recourse in practice.Comment: Extended version. ACM Conference on Fairness, Accountability and
Transparency [FAT2019
Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?
While deep neural network models offer unmatched classification performance,
they are prone to learning spurious correlations in the data. Such dependencies
on confounding information can be difficult to detect using performance metrics
if the test data comes from the same distribution as the training data.
Interpretable ML methods such as post-hoc explanations or inherently
interpretable classifiers promise to identify faulty model reasoning. However,
there is mixed evidence whether many of these techniques are actually able to
do so. In this paper, we propose a rigorous evaluation strategy to assess an
explanation technique's ability to correctly identify spurious correlations.
Using this strategy, we evaluate five post-hoc explanation techniques and one
inherently interpretable method for their ability to detect three types of
artificially added confounders in a chest x-ray diagnosis task. We find that
the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net
provide the best performance and can be used to reliably identify faulty model
behavior
Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals
Interpretability is essential for machine learning algorithms in high-stakes
application fields such as medical image analysis. However, high-performing
black-box neural networks do not provide explanations for their predictions,
which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc
explanation techniques, which are widely used in practice, have been shown to
suffer from severe conceptual problems. Furthermore, as we show in this paper,
current explanation techniques do not perform adequately in the multi-label
scenario, in which multiple medical findings may co-occur in a single image. We
propose Attri-Net, an inherently interpretable model for multi-label
classification. Attri-Net is a powerful classifier that provides transparent,
trustworthy, and human-understandable explanations. The model first generates
class-specific attribution maps based on counterfactuals to identify which
image regions correspond to certain medical findings. Then a simple logistic
regression classifier is used to make predictions based solely on these
attribution maps. We compare Attri-Net to five post-hoc explanation techniques
and one inherently interpretable classifier on three chest X-ray datasets. We
find that Attri-Net produces high-quality multi-label explanations consistent
with clinical knowledge and has comparable classification performance to
state-of-the-art classification models.Comment: Accepted to MIDL 202
On the Rationality of Explanations in Classification Algorithms
This paper is a first step towards studying the rationality of explanations produced by up-to-date AI systems. Based on the thesis that designing rational explanations for accomplishing trustworthy AI is fundamental for ethics in AI, we study the rationality criteria that explanations in classification algorithms have to meet. In this way, we identify, define, and exemplify characteristic criteria of rational explanations in classification algorithms
To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods
The main objective of eXplainable Artificial Intelligence (XAI) is to provide
effective explanations for black-box classifiers. The existing literature lists
many desirable properties for explanations to be useful, but there is no
consensus on how to quantitatively evaluate explanations in practice. Moreover,
explanations are typically used only to inspect black-box models, and the
proactive use of explanations as a decision support is generally overlooked.
Among the many approaches to XAI, a widely adopted paradigm is Local Linear
Explanations - with LIME and SHAP emerging as state-of-the-art methods. We show
that these methods are plagued by many defects including unstable explanations,
divergence of actual implementations from the promised theoretical properties,
and explanations for the wrong label. This highlights the need to have standard
and unbiased evaluation procedures for Local Linear Explanations in the XAI
field. In this paper we address the problem of identifying a clear and
unambiguous set of metrics for the evaluation of Local Linear Explanations.
This set includes both existing and novel metrics defined specifically for this
class of explanations. All metrics have been included in an open Python
framework, named LEAF. The purpose of LEAF is to provide a reference for end
users to evaluate explanations in a standardised and unbiased way, and to guide
researchers towards developing improved explainable techniques.Comment: 16 pages, 8 figure
- …