4 research outputs found
Learning to Rank for Plausible Plausibility
Researchers illustrate improvements in contextual encoding strategies via
resultant performance on a battery of shared Natural Language Understanding
(NLU) tasks. Many of these tasks are of a categorical prediction variety: given
a conditioning context (e.g., an NLI premise), provide a label based on an
associated prompt (e.g., an NLI hypothesis). The categorical nature of these
tasks has led to common use of a cross entropy log-loss objective during
training. We suggest this loss is intuitively wrong when applied to
plausibility tasks, where the prompt by design is neither categorically
entailed nor contradictory given the context. Log-loss naturally drives models
to assign scores near 0.0 or 1.0, in contrast to our proposed use of a
margin-based loss. Following a discussion of our intuition, we describe a
confirmation study based on an extreme, synthetically curated task derived from
MultiNLI. We find that a margin-based loss leads to a more plausible model of
plausibility. Finally, we illustrate improvements on the Choice Of Plausible
Alternative (COPA) task through this change in loss.Comment: To appear in ACL 201
The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code
Causal reasoning, the ability to identify cause-and-effect relationship, is
crucial in human thinking. Although large language models (LLMs) succeed in
many NLP tasks, it is still challenging for them to conduct complex causal
reasoning like abductive reasoning and counterfactual reasoning. Given the fact
that programming code may express causal relations more often and explicitly
with conditional statements like ``if``, we want to explore whether Code-LLMs
acquire better causal reasoning abilities. Our experiments show that compared
to text-only LLMs, Code-LLMs with code prompts are significantly better in
causal reasoning. We further intervene on the prompts from different aspects,
and discover that the programming structure is crucial in code prompt design,
while Code-LLMs are robust towards format perturbations.Comment: Findings of ACL 2023. Code and data are available at
https://github.com/xxxiaol/magic-i
Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation
Determining the plausibility of causal relations between clauses is a
commonsense reasoning task that requires complex inference ability. The general
approach to this task is to train a large pretrained language model on a
specific dataset. However, the available training data for the task is often
scarce, which leads to instability of model training or reliance on the shallow
features of the dataset. This paper presents a number of techniques for making
models more robust in the domain of causal reasoning. Firstly, we perform
adversarial training by generating perturbed inputs through synonym
substitution. Secondly, based on a linguistic theory of discourse connectives,
we perform data augmentation using a discourse parser for detecting causally
linked clauses in large text, and a generative language model for generating
distractors. Both methods boost model performance on the Choice of Plausible
Alternatives (COPA) dataset, as well as on a Balanced COPA dataset, which is a
modified version of the original data that has been developed to avoid
superficial cues, leading to a more challenging benchmark. We show a
statistically significant improvement in performance and robustness on both
datasets, even with only a small number of additionally generated data points.Comment: 7 pages + pages references, 4 figures, 3 tables, paper accepted at
AAAI202
Ranking and Retrieval under Semantic Relevance
This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed.
Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering.
Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions