4,628 research outputs found
JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions
Commonsense reasoning simulates the human ability to make presumptions about
our physical world, and it is an essential cornerstone in building general AI
systems. We propose a new commonsense reasoning dataset based on human's
Interactive Fiction (IF) gameplay walkthroughs as human players demonstrate
plentiful and diverse commonsense reasoning. The new dataset provides a natural
mixture of various reasoning types and requires multi-hop reasoning. Moreover,
the IF game-based construction procedure requires much less human interventions
than previous ones. Experiments show that the introduced dataset is challenging
to previous machine reading models with a significant 20% performance gap
compared to human experts.Comment: arXiv admin note: text overlap with arXiv:2010.0978
Advancing Transformer's Capabilities in Commonsense Reasoning
Recent advances in general purpose pre-trained language models have shown
great potential in commonsense reasoning. However, current works still perform
poorly on standard commonsense reasoning benchmarks including the Com2Sense
Dataset. We argue that this is due to a disconnect with current cutting-edge
machine learning methods. In this work, we aim to bridge the gap by introducing
current ML-based methods to improve general purpose pre-trained language models
in the task of commonsense reasoning. Specifically, we experiment with and
systematically evaluate methods including knowledge transfer, model ensemble,
and introducing an additional pairwise contrastive objective. Our best model
outperforms the strongest previous works by ~15\% absolute gains in Pairwise
Accuracy and ~8.7\% absolute gains in Standard Accuracy
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks
Recent efforts in natural language processing (NLP) commonsense reasoning
research have yielded a considerable number of new datasets and benchmarks.
However, most of these datasets formulate commonsense reasoning challenges in
artificial scenarios that are not reflective of the tasks which real-world NLP
systems are designed to solve. In this work, we present CRoW, a
manually-curated, multi-task benchmark that evaluates the ability of models to
apply commonsense reasoning in the context of six real-world NLP tasks. CRoW is
constructed using a multi-stage data collection pipeline that rewrites examples
from existing datasets using commonsense-violating perturbations. We use CRoW
to study how NLP systems perform across different dimensions of commonsense
knowledge, such as physical, temporal, and social reasoning. We find a
significant performance gap when NLP systems are evaluated on CRoW compared to
humans, showcasing that commonsense reasoning is far from being solved in
real-world task settings. We make our dataset and leaderboard available to the
research community at https://github.com/mismayil/crow.Comment: 37 pages, camera-ready for EMNLP 202
- …