6 research outputs found
Weakly- and Semi-supervised Evidence Extraction
For many prediction tasks, stakeholders desire not only predictions but also
supporting evidence that a human can use to verify its correctness. However, in
practice, additional annotations marking supporting evidence may only be
available for a minority of training examples (if available at all). In this
paper, we propose new methods to combine few evidence annotations (strong
semi-supervision) with abundant document-level labels (weak supervision) for
the task of evidence extraction. Evaluating on two classification tasks that
feature evidence annotations, we find that our methods outperform baselines
adapted from the interpretability literature to our task. Our approach yields
substantial gains with as few as hundred evidence annotations. Code and
datasets to reproduce our work are available at
https://github.com/danishpruthi/evidence-extraction.Comment: Accepted to the Findings of EMNLP 2020, to be presented at
BlackBoxNL
Explain and Predict, and then Predict Again
A desirable property of learning systems is to be both effective and
interpretable. Towards this goal, recent models have been proposed that first
generate an extractive explanation from the input text and then generate a
prediction on just the explanation called explain-then-predict models. These
models primarily consider the task input as a supervision signal in learning an
extractive explanation and do not effectively integrate rationales data as an
additional inductive bias to improve task performance. We propose a novel yet
simple approach ExPred, that uses multi-task learning in the explanation
generation phase effectively trading-off explanation and prediction losses. And
then we use another prediction network on just the extracted explanations for
optimizing the task performance. We conduct an extensive evaluation of our
approach on three diverse language datasets -- fact verification, sentiment
classification, and QA -- and find that we substantially outperform existing
approaches.Comment: Accepted in the WSDM 202
Detecting and Reducing Bias in a High Stakes Domain
Gang-involved youth in cities such as Chicago sometimes post on social media
to express their aggression towards rival gangs and previous research has
demonstrated that a deep learning approach can predict aggression and loss in
posts. To address the possibility of bias in this sensitive application, we
developed an approach to systematically interpret the state of the art model.
We found, surprisingly, that it frequently bases its predictions on stop words
such as "a" or "on", an approach that could harm social media users who have no
aggressive intentions. To tackle this bias, domain experts annotated the
rationales, highlighting words that explain why a tweet is labeled as
"aggression". These new annotations enable us to quantitatively measure how
justified the model predictions are, and build models that drastically reduce
bias. Our study shows that in high stake scenarios, accuracy alone cannot
guarantee a good system and we need new evaluation methods
BERTology Meets Biology: Interpreting Attention in Protein Language Models
Transformer architectures have proven to learn useful representations for
protein classification and generation tasks. However, these representations
present challenges in interpretability. In this work, we demonstrate a set of
methods for analyzing protein Transformer models through the lens of attention.
We show that attention: (1) captures the folding structure of proteins,
connecting amino acids that are far apart in the underlying sequence, but
spatially close in the three-dimensional structure, (2) targets binding sites,
a key functional component of proteins, and (3) focuses on progressively more
complex biophysical properties with increasing layer depth. We find this
behavior to be consistent across three Transformer architectures (BERT, ALBERT,
XLNet) and two distinct protein datasets. We also present a three-dimensional
visualization of the interaction between attention and protein structure. Code
for visualization and analysis is available at
https://github.com/salesforce/provis.Comment: To appear in ICLR 202
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
Feature importance (FI) estimates are a popular form of explanation, and they
are commonly created and evaluated by computing the change in model confidence
caused by removing certain input features at test time. For example, in the
standard Sufficiency metric, only the top-k most important tokens are kept. In
this paper, we study several under-explored dimensions of FI explanations,
providing conceptual and empirical improvements for this form of explanation.
First, we advance a new argument for why it can be problematic to remove
features from an input when creating or evaluating explanations: the fact that
these counterfactual inputs are out-of-distribution (OOD) to models implies
that the resulting explanations are socially misaligned. The crux of the
problem is that the model prior and random weight initialization influence the
explanations (and explanation metrics) in unintended ways. To resolve this
issue, we propose a simple alteration to the model training process, which
results in more socially aligned explanations and metrics. Second, we compare
among five approaches for removing features from model inputs. We find that
some methods produce more OOD counterfactuals than others, and we make
recommendations for selecting a feature-replacement function. Finally, we
introduce four search-based methods for identifying FI explanations and compare
them to strong baselines, including LIME, Anchors, and Integrated Gradients.
Through experiments with six diverse text classification datasets, we find that
the only method that consistently outperforms random search is a Parallel Local
Search (PLS) that we introduce. Improvements over the second-best method are as
large as 5.4 points for Sufficiency and 17 points for Comprehensiveness. All
supporting code for experiments in this paper is publicly available at
https://github.com/peterbhase/ExplanationSearch.Comment: NeurIPS 2021 (25 pages
ERASER: A Benchmark to Evaluate Rationalized NLP Models
State-of-the-art models in NLP are now predominantly based on deep neural
networks that are opaque in terms of how they come to make predictions. This
limitation has increased interest in designing more interpretable deep models
for NLP that reveal the `reasoning' behind model outputs. But work in this
direction has been conducted on different datasets and tasks with
correspondingly unique aims and metrics; this makes it difficult to track
progress. We propose the Evaluating Rationales And Simple English Reasoning
(ERASER) benchmark to advance research on interpretable models in NLP. This
benchmark comprises multiple datasets and tasks for which human annotations of
"rationales" (supporting evidence) have been collected. We propose several
metrics that aim to capture how well the rationales provided by models align
with human rationales, and also how faithful these rationales are (i.e., the
degree to which provided rationales influenced the corresponding predictions).
Our hope is that releasing this benchmark facilitates progress on designing
more interpretable NLP systems. The benchmark, code, and documentation are
available at https://www.eraserbenchmark.com/Comment: Accepted as a long paper to ACL2020 Website and leaderboard available
at http://www.eraserbenchmark.com/ Code available at
https://github.com/jayded/eraserbenchmar