512 research outputs found
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
We present a large-scale collection of diverse natural language inference
(NLI) datasets that help provide insight into how well a sentence
representation captures distinct types of reasoning. The collection results
from recasting 13 existing datasets from 7 semantic phenomena into a common NLI
structure, resulting in over half a million labeled context-hypothesis pairs in
total. We refer to our collection as the DNC: Diverse Natural Language
Inference Collection. The DNC is available online at https://www.decomp.net,
and will grow over time as additional resources are recast and added from novel
sources.Comment: To be presented at EMNLP 2018. 15 page
Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks
Textual entailment is a fundamental task in natural language processing. Most
approaches for solving the problem use only the textual content present in
training data. A few approaches have shown that information from external
knowledge sources like knowledge graphs (KGs) can add value, in addition to the
textual content, by providing background knowledge that may be critical for a
task. However, the proposed models do not fully exploit the information in the
usually large and noisy KGs, and it is not clear how it can be effectively
encoded to be useful for entailment. We present an approach that complements
text-based entailment models with information from KGs by (1) using
Personalized PageR- ank to generate contextual subgraphs with reduced noise and
(2) encoding these subgraphs using graph convolutional networks to capture KG
structure. Our technique extends the capability of text models exploiting
structural and semantic information found in KGs. We evaluate our approach on
multiple textual entailment datasets and show that the use of external
knowledge helps improve prediction accuracy. This is particularly evident in
the challenging BreakingNLI dataset, where we see an absolute improvement of
5-20% over multiple text-based entailment models
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated
Pre-trained language models trained on large-scale data have learned serious
levels of social biases. Consequently, various methods have been proposed to
debias pre-trained models. Debiasing methods need to mitigate only
discriminatory bias information from the pre-trained models, while retaining
information that is useful for the downstream tasks. In previous research,
whether useful information is retained has been confirmed by the performance of
downstream tasks in debiased pre-trained models. On the other hand, it is not
clear whether these benchmarks consist of data pertaining to social biases and
are appropriate for investigating the impact of debiasing. For example in
gender-related social biases, data containing female words (e.g. ``she, female,
woman''), male words (e.g. ``he, male, man''), and stereotypical words (e.g.
``nurse, doctor, professor'') are considered to be the most affected by
debiasing. If there is not much data containing these words in a benchmark
dataset for a target task, there is the possibility of erroneously evaluating
the effects of debiasing. In this study, we compare the impact of debiasing on
performance across multiple downstream tasks using a wide-range of benchmark
datasets that containing female, male, and stereotypical words. Experiments
show that the effects of debiasing are consistently \emph{underestimated}
across all tasks. Moreover, the effects of debiasing could be reliably
evaluated by separately considering instances containing female, male, and
stereotypical words than all of the instances in a benchmark dataset.Comment: IJCNLP-AACL 202
- …