16 research outputs found
e-SNLI: Natural Language Inference with Natural Language Explanations
In order for machine learning to garner widespread public adoption, models
must be able to provide interpretable and robust explanations for their
decisions, as well as learn from human-provided explanations at train time. In
this work, we extend the Stanford Natural Language Inference dataset with an
additional layer of human-annotated natural language explanations of the
entailment relations. We further implement models that incorporate these
explanations into their training process and output them at test time. We show
how our corpus of explanations, which we call e-SNLI, can be used for various
goals, such as obtaining full sentence justifications of a model's decisions,
improving universal sentence representations and transferring to out-of-domain
NLI datasets. Our dataset thus opens up a range of research directions for
using natural language explanations, both for improving models and for
asserting their trust.Comment: NeurIPS 201
SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations
Explaining the decisions of neural models is crucial for ensuring their
trustworthiness at deployment time. Using Natural Language Explanations (NLEs)
to justify a model's predictions has recently gained increasing interest.
However, this approach usually demands large datasets of human-written NLEs for
the ground-truth answers, which are expensive and potentially infeasible for
some applications. For models to generate high-quality NLEs when only a few
NLEs are available, the fine-tuning of Pre-trained Language Models (PLMs) in
conjunction with prompt-based learning recently emerged. However, PLMs
typically have billions of parameters, making fine-tuning expensive. We propose
SparseFit, a sparse few-shot fine-tuning strategy that leverages discrete
prompts to jointly generate predictions and NLEs. We experiment with SparseFit
on the T5 model and four datasets and compare it against state-of-the-art
parameter-efficient fine-tuning techniques. We perform automatic and human
evaluations to assess the quality of the model-generated NLEs, finding that
fine-tuning only 6.8% of the model parameters leads to competitive results for
both the task performance and the quality of the NLEs
Identifying Linear Relational Concepts in Large Language Models
Transformer language models (LMs) have been shown to represent concepts as
directions in the latent space of hidden activations. However, for any given
human-interpretable concept, how can we find its direction in the latent space?
We present a technique called linear relational concepts (LRC) for finding
concept directions corresponding to human-interpretable concepts at a given
hidden layer in a transformer LM by first modeling the relation between subject
and object as a linear relational embedding (LRE). While the LRE work was
mainly presented as an exercise in understanding model representations, we find
that inverting the LRE while using earlier object layers results in a powerful
technique to find concept directions that both work well as a classifier and
causally influence model outputs
The Gap on GAP: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets
Diagnostic datasets that can detect biased models are an important
prerequisite for bias reduction within natural language processing. However,
undesired patterns in the collected data can make such tests incorrect. For
example, if the feminine subset of a gender-bias-measuring coreference
resolution dataset contains sentences with a longer average distance between
the pronoun and the correct candidate, an RNN-based model may perform worse on
this subset due to long-term dependencies. In this work, we introduce a
theoretically grounded method for weighting test samples to cope with such
patterns in the test data. We demonstrate the method on the GAP dataset for
coreference resolution. We annotate GAP with spans of all personal names and
show that examples in the female subset contain more personal names and a
longer distance between pronouns and their referents, potentially affecting the
bias score in an undesired way. Using our weighting method, we find the set of
weights on the test instances that should be used for coping with these
correlations, and we re-evaluate 16 recently released coreference models.Comment: Accepted to AAAI 2021 conference and AFCI workshop at NeurIPS 2020
conferenc
Cyclotomic coefficients: gaps and jumps
We improve several recent results by Hong, Lee, Lee and Park (2012) on gaps
and Bzd\c{e}ga (2014) on jumps amongst the coefficients of cyclotomic
polynomials. Besides direct improvements, we also introduce several new
techniques that have never been used in this area.Comment: 25 page
Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution
Hard cases of pronoun resolution have been used as a long-standing benchmark
for commonsense reasoning. In the recent literature, pre-trained language
models have been used to obtain state-of-the-art results on pronoun resolution.
Overall, four categories of training and evaluation objectives have been
introduced. The variety of training datasets and pre-trained language models
used in these works makes it unclear whether the choice of training objective
is critical. In this work, we make a fair comparison of the performance and
seed-wise stability of four models that represent the four categories of
objectives. Our experiments show that the objective of sequence ranking
performs the best in-domain, while the objective of semantic similarity between
candidates and pronoun performs the best out-of-domain. We also observe a
seed-wise instability of the model using sequence ranking, which is not the
case when the other objectives are used.Comment: Accepted to the EMNLP 2020 conferenc
Using Natural Language Explanations to Improve Robustness of In-context Learning for Natural Language Inference
Recent studies have demonstrated that large language models (LLMs) excel in
diverse tasks through in-context learning (ICL) facilitated by task-specific
prompts and examples. However, the existing literature shows that ICL
encounters performance deterioration when exposed to adversarial inputs.
Enhanced performance has been observed when ICL is augmented with natural
language explanations (NLEs) (we refer to it as X-ICL). Thus, this work
investigates whether X-ICL can improve the robustness of LLMs on a suite of
seven adversarial and challenging natural language inference datasets.
Moreover, we introduce a new approach to X-ICL by prompting an LLM (ChatGPT in
our case) with few human-generated NLEs to produce further NLEs (we call it
ChatGPT few-shot), which we show superior to both ChatGPT zero-shot and
human-generated NLEs alone. We evaluate five popular LLMs (GPT3.5-turbo,
LLaMa2, Vicuna, Zephyr, Mistral) and show that X-ICL with ChatGPT few-shot
yields over 6% improvement over ICL. Furthermore, while prompt selection
strategies were previously shown to significantly improve ICL on
in-distribution test sets, we show that these strategies do not match the
efficacy of the X-ICL paradigm in robustness-oriented evaluations.Comment: pre-prin
Faithfulness Tests for Natural Language Explanations
Explanations of neural models aim to reveal a model's decision-making process
for its predictions. However, recent work shows that current methods giving
explanations such as saliency maps or counterfactuals can be misleading, as
they are prone to present reasons that are unfaithful to the model's inner
workings. This work explores the challenging question of evaluating the
faithfulness of natural language explanations (NLEs). To this end, we present
two tests. First, we propose a counterfactual input editor for inserting
reasons that lead to counterfactual predictions but are not reflected by the
NLEs. Second, we reconstruct inputs from the reasons stated in the generated
NLEs and check how often they lead to the same predictions. Our tests can
evaluate emerging NLE models, proving a fundamental tool in the development of
faithful NLEs