Search CORE

170 research outputs found

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Author: Abdou Mostafa
Barrett Maria
Belinkov Yonatan
Elliott Desmond
Ravishankar Vinit
Søgaard Anders
Publication venue
Publication date: 01/01/2020
Field of study

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones. Overall, humans are correct more often than out-of-the-box models, and the models are sometimes right for the wrong reasons. Finally, we show that fine-tuning on a large, task-specific dataset can offer a solution to these issues.Comment: ACL 202

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning

Author: Bennett Brandon
Hong Suk Joon
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 24/11/2020
Field of study

The Winograd Schema Challenge (WSC) is a commonsense reasoning task that requires background knowledge. In this paper, we contribute to tackling WSC in four ways. Firstly, we suggest a keyword method to define a restricted domain where distinctive high-level semantic patterns can be found. A thanking domain was defined by keywords, and the data set in this domain is used in our experiments. Secondly, we develop a high-level knowledge-based reasoning method using semantic roles which is based on the method of Sharma [Sharma, 2019]. Thirdly, we propose an ensemble method to combine knowledge-based reasoning and machine learning which shows the best performance in our experiments. As a machine learning method, we used Bidirectional Encoder Representations from Transformers (BERT) [Jacob Devlin et al., 2018; Vid Kocijan et al., 2019]. Lastly, in terms of evaluation, we suggest a "robust" accuracy measurement by modifying that of Trichelair et al. [Trichelair et al., 2018]. As with their switching method, we evaluate a model by considering its performance on trivial variants of each sentence in the test set

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Attention Is (not) All You Need for Commonsense Reasoning

Author: Klein Tassilo
Nabi Moin
Publication venue
Publication date: 01/01/2019
Field of study

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.Comment: to appear at ACL 201

arXiv.org e-Print Archive

Crossref