Search CORE

813 research outputs found

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

Author: Emelin Denis
Sennrich Rico
Publication venue
Publication date: 07/11/2021
Field of study

Winograd schemas are a well-established tool for evaluating coreference resolution (CoR) and commonsense reasoning (CSR) capabilities of computational models. So far, schemas remained largely confined to English, limiting their utility in multilingual settings. This work presents Wino-X, a parallel dataset of German, French, and Russian schemas, aligned with their English counterparts. We use this resource to investigate whether neural machine translation (NMT) models can perform CoR that requires commonsense knowledge and whether multilingual language models (MLLMs) are capable of CSR across multiple languages. Our findings show Wino-X to be exceptionally challenging for NMT systems that are prone to undesirable biases and unable to detect disambiguating information. We quantify biases using established statistical methods and define ways to address both of these issues. We furthermore present evidence of active cross-lingual knowledge transfer in MLLMs, whereby fine-tuning models on English schemas yields CSR improvements in other languages

Edinburgh Research Explorer

ZORA

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

Author: Bosselut Antoine
Geva Mor
Ismayilzada Mete
Montariol Syrielle
Paul Debjit
Publication venue
Publication date: 23/10/2023
Field of study

Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. CRoW is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. We use CRoW to study how NLP systems perform across different dimensions of commonsense knowledge, such as physical, temporal, and social reasoning. We find a significant performance gap when NLP systems are evaluated on CRoW compared to humans, showcasing that commonsense reasoning is far from being solved in real-world task settings. We make our dataset and leaderboard available to the research community at https://github.com/mismayil/crow.Comment: 37 pages, camera-ready for EMNLP 202

arXiv.org e-Print Archive

Experience and Prediction: A Metric of Hardness for a Novel Litmus Test

Author: Isaak Nicos
Michael Loizos
Publication venue
Publication date: 05/09/2023
Field of study

In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this \textit{hardness-metric} could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness-indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on Machine Learning (ML), able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.Comment: 33 pages, 10 figures

arXiv.org e-Print Archive

Recommended from our members

The Theory of Correlation Formulas and Their Application to Discourse Coherence

Author: Michael Julian
Publication venue
Publication date: 01/05/2015
Field of study

The Winograd Schema Challenge (WSC) was proposed as a measure of machine intelligence. It boils down to anaphora resolution, a task familiar from computational linguistics. Research in linguistics and AI has coalesced around discourse coherence as the critical factor in solving this task, and the process of establishing discourse coherence relies fundamentally on world and commonsense knowledge. In this thesis, we build on an approach to establishing coherence on the basis of it correlation. The utility of this approach lies in its conceptual clarity and ability to flexibly represent commonsense knowledge. We work to fill some conceptual holes with the Correlation Calculus approach. First, understanding the calculus in a vacuum is not straightfoward unless it has a precise semantics. Second, existing demonstrations of the Correlation Calculus on Winograd Schema Challenge problems have not been linguistically credible. We hope to ameliorate some---but by no means all---of the outstanding issues with the Correlation Calculus. We do so first by providing a precise semantics of the calculus, which relates our intuitive understanding of correlation with a precise notion involving probabilities. Second, we formulate the establishment of discourse coherence by correlation formulas within the framework of Discourse Representation Theory. This provides a more complete and linguistically credible account of the relationship between the Correlation Calculus, discourse coherence, and Winograd Schema Challenge problems.Computer Science

Texas ScholarWorks

Interset: A natural language interface for teleoperated robotic assembly of the EASE space structure

Author: Boorsma Daniel K.
Publication venue
Publication date
Field of study

A teleoperated robot was used to assemble the Experimental Assembly of Structures in Extra-vehicular activity (EASE) space structure under neutral buoyancy conditions, simulating a telerobot performing structural assembly in the zero gravity of space. This previous work used a manually controlled teleoperator as a test bed for system performance evaluations. From these results several Artificial Intelligence options were proposed. One of these was further developed into a real time assembly planner. The interface for this system is effective in assembling EASE structures using windowed graphics and a set of networked menus. As the problem space becomes more complex and hence the set of control options increases, a natural language interface may prove to be beneficial to supplement the menu based control strategy. This strategy can be beneficial in situations such as: describing the local environment, maintaining a data base of task event histories, modifying a plan or a heuristic dynamically, summarizing a task in English, or operating in a novel situation

NASA Technical Reports Server

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

Author: Haldar Aparajita
Hu J. Edward
Pavlick Ellie
Poliak Adam
Rudinger Rachel
Van Durme Benjamin
White Aaron Steven
Publication venue
Publication date: 01/01/2018
Field of study

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.Comment: To be presented at EMNLP 2018. 15 page

arXiv.org e-Print Archive

Crossref

Scholarship, Research, and Creative Work at Bryn Mawr College | Bryn Mawr College Research