Many interpretability tools allow practitioners and researchers to explain
Natural Language Processing systems. However, each tool requires different
configurations and provides explanations in different forms, hindering the
possibility of assessing and comparing them. A principled, unified evaluation
benchmark will guide the users through the central question: which explanation
method is more reliable for my use case? We introduce ferret, an easy-to-use,
extensible Python library to explain Transformer-based models integrated with
the Hugging Face Hub. It offers a unified benchmarking suite to test and
compare a wide range of state-of-the-art explainers on any text or
interpretability corpora. In addition, ferret provides convenient programming
abstractions to foster the introduction of new explanation methods, datasets,
or evaluation metrics

Attanasio, Giuseppe

Di Bonaventura, Chiara

Nozza, Debora

Pastor, Eliana

English

arXiv

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them. We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers. With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any free-text or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library. In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

ferret: a Framework for Benchmarking Explainers on Transformers

As Transformers are increasingly relied upon to solve complex NLP problems,
there is an increased need for their decisions to be humanly interpretable.
While several explainable AI (XAI) techniques for interpreting the outputs of
transformer-based models have been proposed, there is still a lack of easy
access to using and comparing them. We introduce ferret, a Python library to
simplify the use and comparisons of XAI methods on transformer-based
classifiers. With ferret, users can visualize and compare transformers-based
models output explanations using state-of-the-art XAI methods on any free-text
or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics
to select the most faithful and plausible explanations. To align with the
recently consolidated process of sharing and using transformers-based models
from Hugging Face, ferret interfaces directly with its Python library. In this
paper, we showcase ferret to benchmark XAI methods used on transformers for
sentiment analysis and hate speech detection. We show how specific methods
provide consistently better explanations and are preferable in the context of
transformer models.Comment: 11 pages, 3 figures. Accepted to EACL 2023 (System Demonstration).
  More details at https://github.com/g8a9/ferre

ferret: a Framework for Benchmarking Explainers on Transformers

Abstract

Similar works

Full text

Available Versions

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

arXiv.org e-Print Archive