Are Your Explanations Reliable? Investigating the Stability of LIME in
  Explaining Text Classifiers by Marrying XAI and Adversarial Attack

Burger, Christopher; Chen, Lingwei; Le, Thai

Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Text Classifiers by Marrying XAI and Adversarial Attack

Authors: Christopher Burger
Lingwei Chen
Thai Le
Publication date: 15 October 2023
Publisher

Abstract

LIME has emerged as one of the most commonly referenced tools in explainable AI (XAI) frameworks that is integrated into critical machine learning applications--e.g., healthcare and finance. However, its stability remains little explored, especially in the context of text data, due to the unique text-space constraints. To address these challenges, in this paper, we first evaluate the inherent instability of LIME on text data to establish a baseline, and then propose a novel algorithm XAIFooler to perturb text inputs and manipulate explanations that casts investigation on the stability of LIME as a text perturbation optimization problem. XAIFooler conforms to the constraints to preserve text semantics and original prediction with small perturbations, and introduces Rank-biased Overlap (RBO) as a key part to guide the optimization of XAIFooler that satisfies all the requirements for explanation similarity measure. Extensive experiments on real-world text datasets demonstrate that XAIFooler significantly outperforms all baselines by large margins in its ability to manipulate LIME's explanations with high semantic preservability.Comment: 14 pages, 6 figures. Replacement by the updated version to be published in EMNLP 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.12351

Last time updated on 24/05/2023