957 research outputs found
Interpretability and Explainability: A Machine Learning Zoo Mini-tour
In this review, we examine the problem of designing interpretable and
explainable machine learning models. Interpretability and explainability lie at
the core of many machine learning and statistical applications in medicine,
economics, law, and natural sciences. Although interpretability and
explainability have escaped a clear universal definition, many techniques
motivated by these properties have been developed over the recent 30 years with
the focus currently shifting towards deep learning methods. In this review, we
emphasise the divide between interpretability and explainability and illustrate
these two different research directions with concrete examples of the
state-of-the-art. The review is intended for a general machine learning
audience with interest in exploring the problems of interpretation and
explanation beyond logistic regression or random forest variable importance.
This work is not an exhaustive literature survey, but rather a primer focusing
selectively on certain lines of research which the authors found interesting or
informative
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization
Human-annotated textual explanations are becoming increasingly important in
Explainable Natural Language Processing. Rationale extraction aims to provide
faithful (i.e., reflective of the behavior of the model) and plausible (i.e.,
convincing to humans) explanations by highlighting the inputs that had the
largest impact on the prediction without compromising the performance of the
task model. In recent works, the focus of training rationale extractors was
primarily on optimizing for plausibility using human highlights, while the task
model was trained on jointly optimizing for task predictive accuracy and
faithfulness. We propose REFER, a framework that employs a differentiable
rationale extractor that allows to back-propagate through the rationale
extraction process. We analyze the impact of using human highlights during
training by jointly training the task model and the rationale extractor. In our
experiments, REFER yields significantly better results in terms of
faithfulness, plausibility, and downstream task accuracy on both
in-distribution and out-of-distribution data. On both e-SNLI and CoS-E, our
best setting produces better results in terms of composite normalized relative
gain than the previous baselines by 11% and 3%, respectively
Towards Faithful Model Explanation in NLP: A Survey
End-to-end neural NLP architectures are notoriously difficult to understand,
which gives rise to numerous efforts towards model explainability in recent
years. An essential principle of model explanation is Faithfulness, i.e., an
explanation should accurately represent the reasoning process behind the
model's prediction. This survey first discusses the definition and evaluation
of Faithfulness, as well as its significance for explainability. We then
introduce the recent advances in faithful explanation by grouping approaches
into five categories: similarity methods, analysis of model-internal
structures, backpropagation-based methods, counterfactual intervention, and
self-explanatory models. Each category will be illustrated with its
representative studies, advantages, and shortcomings. Finally, we discuss all
the above methods in terms of their common virtues and limitations, and reflect
on future work directions towards faithful explainability. For researchers
interested in studying interpretability, this survey will offer an accessible
and comprehensive overview of the area, laying the basis for further
exploration. For users hoping to better understand their own models, this
survey will be an introductory manual helping with choosing the most suitable
explanation method(s).Comment: 62 page
- ā¦