37 research outputs found
Lexicosyntactic Inference in Neural Models
We investigate neural models' ability to capture lexicosyntactic inferences:
inferences triggered by the interaction of lexical and syntactic information.
We take the task of event factuality prediction as a case study and build a
factuality judgment dataset for all English clause-embedding verbs in various
syntactic contexts. We use this dataset, which we make publicly available, to
probe the behavior of current state-of-the-art neural systems, showing that
these systems make certain systematic errors that are clearly visible through
the lens of factuality prediction
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
We present a large-scale collection of diverse natural language inference
(NLI) datasets that help provide insight into how well a sentence
representation captures distinct types of reasoning. The collection results
from recasting 13 existing datasets from 7 semantic phenomena into a common NLI
structure, resulting in over half a million labeled context-hypothesis pairs in
total. We refer to our collection as the DNC: Diverse Natural Language
Inference Collection. The DNC is available online at https://www.decomp.net,
and will grow over time as additional resources are recast and added from novel
sources.Comment: To be presented at EMNLP 2018. 15 page
Zero-shot Entailment of Leaderboards for Empirical AI Research
We present a large-scale empirical investigation of the zero-shot learning
phenomena in a specific recognizing textual entailment (RTE) task category,
i.e. the automated mining of leaderboards for Empirical AI Research. The prior
reported state-of-the-art models for leaderboards extraction formulated as an
RTE task, in a non-zero-shot setting, are promising with above 90% reported
performances. However, a central research question remains unexamined: did the
models actually learn entailment? Thus, for the experiments in this paper, two
prior reported state-of-the-art models are tested out-of-the-box for their
ability to generalize or their capacity for entailment, given leaderboard
labels that were unseen during training. We hypothesize that if the models
learned entailment, their zero-shot performances can be expected to be
moderately high as well--perhaps, concretely, better than chance. As a result
of this work, a zero-shot labeled dataset is created via distant labeling
formulating the leaderboard extraction RTE task.Comment: 5 pages, 1 figure. Accepted for publication at JCDL 2023 - Late
Breaking Results and Datasets track
(https://2023.jcdl.org/calls/papers/#paper_types), official citation
forthcomin
Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
Natural language inference (NLI) is an increasingly important task for
natural language understanding, which requires one to infer whether a sentence
entails another. However, the ability of NLI models to make pragmatic
inferences remains understudied. We create an IMPlicature and PRESupposition
diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated
sentence pairs illustrating well-studied pragmatic inference types. We use
IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on
MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although
MultiNLI appears to contain very few pairs illustrating these inference types,
we find that BERT learns to draw pragmatic inferences. It reliably treats
scalar implicatures triggered by "some" as entailments. For some presupposition
triggers like "only", BERT reliably recognizes the presupposition as an
entailment, even when the trigger is embedded under an entailment canceling
operator like negation. BOW and InferSent show weaker evidence of pragmatic
reasoning. We conclude that NLI training encourages models to learn some, but
not all, pragmatic inferences.Comment: to appear in Proceedings of ACL 202
Automatically Neutralizing Subjective Bias in Text
Texts like news, encyclopedias, and some social media strive for objectivity.
Yet bias in the form of inappropriate subjectivity - introducing attitudes via
framing, presupposing truth, and casting doubt - remains ubiquitous. This kind
of bias erodes our collective trust and fuels social conflict. To address this
issue, we introduce a novel testbed for natural language generation:
automatically bringing inappropriately subjective text into a neutral point of
view ("neutralizing" biased text). We also offer the first parallel corpus of
biased language. The corpus contains 180,000 sentence pairs and originates from
Wikipedia edits that removed various framings, presuppositions, and attitudes
from biased sentences. Last, we propose two strong encoder-decoder baselines
for the task. A straightforward yet opaque CONCURRENT system uses a BERT
encoder to identify subjective words as part of the generation process. An
interpretable and controllable MODULAR algorithm separates these steps, using
(1) a BERT-based classifier to identify problematic words and (2) a novel join
embedding through which the classifier can edit the hidden states of the
encoder. Large-scale human evaluation across four domains (encyclopedias, news
headlines, books, and political speeches) suggests that these algorithms are a
first step towards the automatic identification and reduction of bias.Comment: To appear at AAAI 202