Search CORE

140 research outputs found

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Author: Abdou Mostafa
Barrett Maria
Belinkov Yonatan
Elliott Desmond
Ravishankar Vinit
Søgaard Anders
Publication venue
Publication date: 01/01/2020
Field of study

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones. Overall, humans are correct more often than out-of-the-box models, and the models are sometimes right for the wrong reasons. Finally, we show that fine-tuning on a large, task-specific dataset can offer a solution to these issues.Comment: ACL 202

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Causal interventions expose implicit situation models for commonsense language understanding

Author: Goldberg Adele E.
Hawkins Robert D.
McClelland James L.
Yamakoshi Takateru
Publication venue
Publication date: 06/06/2023
Field of study

Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.Comment: Findings of AC

arXiv.org e-Print Archive

BRAINTEASER: Lateral Thinking Puzzles for Large Language Models

Author: Ilievski Filip
Jiang Yifan
Ma Kaixin
Sourati Zhivar
Publication venue
Publication date: 09/11/2023
Field of study

The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking and defy default commonsense associations. We design a three-step procedure for creating the first lateral thinking benchmark, consisting of data collection, distractor generation, and generation of adversarial examples, leading to 1,100 puzzles with high-quality annotations. To assess the consistency of lateral reasoning by models, we enrich BRAINTEASER based on a semantic and contextual reconstruction of its questions. Our experiments with state-of-the-art instruction- and commonsense language models reveal a significant gap between human and model performance, which is further widened when consistency across adversarial formats is considered. We make all of our code and data available to stimulate work on developing and evaluating lateral thinking models

arXiv.org e-Print Archive

Event knowledge in large language models: the gap between the impossible and the unlikely

Author: Chersoni Emmanuele
Chowdhury Zawad
Fedorenko Evelina
Ivanova Anna A.
Kauf Carina
Lenci Alessandro
Rambelli Giulia
She Jingyuan Selena
Publication venue
Publication date: 26/10/2023
Field of study

Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs' semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pre-trained LLMs (from 2018's BERT to 2023's MPT) assign higher likelihood to plausible descriptions of agent-patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n=1,215), we found that pre-trained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign higher likelihood to possible vs. impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely vs. unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.Comment: The two lead authors have contributed equally to this wor

arXiv.org e-Print Archive

Assessing vulnerability for climate adaptation

Author: Downing T.E.
Klein RJT
Mukhala E
Patwardhan A
Stephen L
Winograd Manuel
Ziervogel G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 28/01/2015
Field of study

CGSpace

Evaluating and improving lexical language understanding in neural machine translation

Author: Emelin Denis
Publication venue: The University of Edinburgh
Publication date: 23/02/2024
Field of study

Lexical understanding is an inalienable component of the translation process. In order to correctly map the meaning of a linguistic unit to the appropriate target language expression, the meaning of its constituent words has first to be identified and disambiguated, followed by the application of compositional operations. This thesis examines the competency of contemporary neural machine translation (NMT) models on two core aspects of lexical understanding – word sense disambiguation (WSD) and coreference resolution (CoR), both of which are well-established and much-studied natural language processing (NLP) tasks. Certain linguistic properties that are under-specified in a source language (e.g. the grammatical gender of a noun in English) may need to be stated explicitly in the chosen target language (e.g. German). Doing so correctly requires the accurate resolution of the associated ambiguities. While recent modeling advances appear to suggest that both WSD and CoR are largely solved challenges in machine translation, the work conducted within the scope of this thesis demonstrates that this is not yet the case. In particular, we show that NMT systems are prone to relying on surface-level heuristics and data biases to guide their lexical disambiguation decisions, rather than engaging in deep language understanding by correctly recognizing and leveraging contextual disambiguation triggers. As part of our investigation, we introduce a novel methodology for predicting WSD errors a translation model is likely to make and utilize this knowledge to craft adversarial attacks with the aim to elicit disambiguation errors in model translations. Additionally, we create a set of challenging CoR benchmarks that uncover the inability of translation systems to identify referents of pronouns in contexts that presuppose commonsense reasoning, caused by their pathological over-reliance on data biases. At the same time, we develop initial solutions for the identified model deficiencies. As such, we show that fine-tuning on de-biased data and modifying the learning objective of a model can significantly improve disambiguation performance by counteracting the harmful impact of data biases. We furthermore propose a novel extension to the popular transformer architecture that is found to strengthen its WSD capabilities and robustness to adversarial WSD attacks by facilitating the accessibility of lexical features across all layers of the model and increasing the extent to which contextual information is encapsulated with its latent representations. Despite the so effected improvements to WSD and CoR, both tasks remain far from solved, posing a veritable challenge for the current generation of NMT models, as well as for large language models that have risen to prominence within NLP in recent years

Edinburgh Research Archive

Survey on Sociodemographic Bias in Natural Language Processing

Author: Gupta Vipul
Passonneau Rebecca J.
Venkit Pranav Narayanan
Wilson Shomir
Publication venue
Publication date: 26/06/2023
Field of study

Deep neural networks often learn unintended biases during training, which might have harmful effects when deployed in real-world settings. This paper surveys 209 papers on bias in NLP models, most of which address sociodemographic bias. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world biases, and that current debiasing techniques are superficial and hide bias rather than removing it. Finally, we provide recommendations for future work.Comment: 23 pages, 1 figur

arXiv.org e-Print Archive

Improving BERT with Self-Supervised Attention

Author: Bai Jiangang
Chen Yiren
Kou Xiaoyu
Tong Yunhai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these finetuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model

arXiv.org e-Print Archive

Directory of Open Access Journals

Emerging Evaluation Paradigms in Natural Language Understanding: A Case Study in Machine Reading Comprehension

Author: Schlegel Viktor
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository