Evaluating explanations of artificial intelligence decisions : the explanation quality rubric and survey

Abstract

The use of Artificial Intelligence (AI) algorithms is growing rapidly (Vilone & Longo, 2020). With this comes an increasing demand for reliable, robust explanations of AI decisions. There is a pressing need for a way to evaluate their quality. This thesis examines these research questions: What would a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations look like? How can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be created? Can a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations be used to improve explanations? Current Explainable Artificial Intelligence (XAI) research lacks an accepted, widely employed method for evaluating AI explanations. This thesis offers a method for creating a rigorous, empirically justified, human-centred scheme for evaluating AI-decision explanations. It uses this to create an evaluation methodology, the XQ Rubric and XQ Survey. The XQ Rubric and Survey are then employed to improve explanations of AI decisions. The thesis asks what constitutes a good explanation in the context of XAI. It provides: 1. a model of good explanation for use in XAI research 2. a method of gathering non-expert evaluations of XAI explanations 3. an evaluation scheme for non-experts to employ in assessing XAI explanations (XQ Rubric and XQ Survey). The thesis begins with a literature review, primarily an exploration of previous attempts to evaluate XAI explanations formally. This is followed by an account of the development and iterative refinement of a solution to the problem, the eXplanation Quality Rubric (XQ Rubric). A Design Science methodology was used to guide the XQ Rubric and XQ Survey development. The thesis limits itself to XAI explanations appropriate for non-experts. It proposes and tests an evaluation rubric and survey method that is both stable and robust: that is, readily usable and consistently reliable in a variety of XAI-explanation tasks.Doctor of Philosoph

    Similar works