Search CORE

563 research outputs found

The Pragmatic Turn in Explainable Artificial Intelligence (XAI)

Author: Páez Andrés
Publication venue
Publication date: 01/01/2019
Field of study

In this paper I argue that the search for explainable models and interpretable decisions in AI must be reformulated in terms of the broader project of offering a pragmatic and naturalistic account of understanding in AI. Intuitively, the purpose of providing an explanation of a model or a decision is to make it understandable to its stakeholders. But without a previous grasp of what it means to say that an agent understands a model or a decision, the explanatory strategies will lack a well-defined goal. Aside from providing a clearer objective for XAI, focusing on understanding also allows us to relax the factivity condition on explanation, which is impossible to fulfill in many machine learning models, and to focus instead on the pragmatic conditions that determine the best fit between a model and the methods and devices deployed to understand it. After an examination of the different types of understanding discussed in the philosophical and psychological literature, I conclude that interpretative or approximation models not only provide the best way to achieve the objectual understanding of a machine learning model, but are also a necessary condition to achieve post hoc interpretability. This conclusion is partly based on the shortcomings of the purely functionalist approach to post hoc interpretability that seems to be predominant in most recent literature

PhilPapers

arXiv.org e-Print Archive

Explaining classifiers’ outputs with causal models and argumentation

Author: Albini E
Baroni P
Rago A
Russo F
Toni F
Publication venue: College Publications
Publication date: 01/05/2023
Field of study

We introduce a conceptualisation for generating argumentation frameworks (AFs) from causal models for the purpose of forging explanations for mod-els’ outputs. The conceptualisation is based on reinterpreting properties of semantics of AFs as explanation moulds, which are means for characterising argumentative relations. We demonstrate our methodology by reinterpreting the property of bi-variate reinforcement in bipolar AFs, showing how the ex-tracted bipolar AFs may be used as relation-based explanations for the outputs of causal models. We then evaluate our method empirically when the causal models represent (Bayesian and neural network) machine learning models for classification. The results show advantages over a popular approach from the literature, both in highlighting specific relationships between feature and classification variables and in generating counterfactual explanations with respect to a commonly used metric

Spiral - Imperial College Digital Repository

Clarity: an improved gradient method for producing quality visual counterfactual explanations

Author: Conan-Guez Brieuc
Couceiro Miguel
Napoli Amedeo
Pennerath Frédéric
Theobald Claire
Publication venue
Publication date: 22/11/2022
Field of study

Visual counterfactual explanations identify modifications to an image that would change the prediction of a classifier. We propose a set of techniques based on generative models (VAE) and a classifier ensemble directly trained in the latent space, which all together, improve the quality of the gradient required to compute visual counterfactuals. These improvements lead to a novel classification model, Clarity, which produces realistic counterfactual explanations over all images. We also present several experiments that give insights on why these techniques lead to better quality results than those in the literature. The explanations produced are competitive with the state-of-the-art and emphasize the importance of selecting a meaningful input space for training

arXiv.org e-Print Archive

LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees

Author: Flach Peter
Sokol Kacper
Publication venue
Publication date: 04/05/2020
Field of study

Systems based on artificial intelligence and machine learning models should be transparent, in the sense of being capable of explaining their decisions to gain humans' approval and trust. While there are a number of explainability techniques that can be used to this end, many of them are only capable of outputting a single one-size-fits-all explanation that simply cannot address all of the explainees' diverse needs. In this work we introduce a model-agnostic and post-hoc local explainability technique for black-box predictions called LIMEtree, which employs surrogate multi-output regression trees. We validate our algorithm on a deep neural network trained for object detection in images and compare it against Local Interpretable Model-agnostic Explanations (LIME). Our method comes with local fidelity guarantees and can produce a range of diverse explanation types, including contrastive and counterfactual explanations praised in the literature. Some of these explanations can be interactively personalised to create bespoke, meaningful and actionable insights into the model's behaviour. While other methods may give an illusion of customisability by wrapping, otherwise static, explanations in an interactive interface, our explanations are truly interactive, in the sense of allowing the user to "interrogate" a black-box model. LIMEtree can therefore produce consistent explanations on which an interactive exploratory process can be built

arXiv.org e-Print Archive

Explore Bristol Research

Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

Author: Dai Wuyang
Kim Been
Kingma Diederik P
Lundberg Scott M
Tan S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/12/2019
Field of study

Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at https://github.com/microsoft/DiCE.Comment: 13 page

arXiv.org e-Print Archive

Crossref

Achieving descriptive accuracy in explanations via argumentation: the case of probabilistic classifiers

Author: Albini E
Baroni P
Rago A
Toni F
Publication venue: 'Frontiers Media SA'
Publication date: 20/03/2023
Field of study

The pursuit of trust in and fairness of AI systems in order to enable human-centric goals has been gathering pace of late, often supported by the use of explanations for the outputs of these systems. Several properties of explanations have been highlighted as critical for achieving trustworthy and fair AI systems, but one that has thus far been overlooked is that of descriptive accuracy (DA), i.e., that the explanation contents are in correspondence with the internal working of the explained system. Indeed, the violation of this core property would lead to the paradoxical situation of systems producing explanations which are not suitably related to how the system actually works: clearly this may hinder user trust. Further, if explanations violate DA then they can be deceitful, resulting in an unfair behavior toward the users. Crucial as the DA property appears to be, it has been somehow overlooked in the XAI literature to date. To address this problem, we consider the questions of formalizing DA and of analyzing its satisfaction by explanation methods. We provide formal definitions of naive, structural and dialectical DA, using the family of probabilistic classifiers as the context for our analysis. We evaluate the satisfaction of our given notions of DA by several explanation methods, amounting to two popular feature-attribution methods from the literature, variants thereof and a novel form of explanation that we propose. We conduct experiments with a varied selection of concrete probabilistic classifiers and highlight the importance, with a user study, of our most demanding notion of dialectical DA, which our novel method satisfies by design and others may violate. We thus demonstrate how DA could be a critical component in achieving trustworthy and fair systems, in line with the principles of human-centric AI

Spiral - Imperial College Digital Repository

Interpretability and Explainability: A Machine Learning Zoo Mini-tour

Author: Marcinkevičs Ričards
Vogt Julia E.
Publication venue
Publication date: 03/12/2020
Field of study

In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative

arXiv.org e-Print Archive