Search CORE

6,962 research outputs found

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

Author: Jiang Youxuan
Kummerfeld Jonathan K.
Lasecki Walter S.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we present the first systematic study of the key factors in crowdsourcing paraphrase collection. We consider variations in instructions, incentives, data domains, and workflows. We manually analyzed paraphrases for correctness, grammaticality, and linguistic diversity. Our observations provide new insight into the trade-offs between accuracy and diversity in crowd responses that arise as a result of task design, providing guidance for future paraphrase generation procedures.Comment: Published at ACL 201

arXiv.org e-Print Archive

A Theme-Rewriting Approach for Generating Algebra Word Problems

Author: Hajishirzi Hannaneh
Koncel-Kedziorski Rik
Konstas Ioannis
Zettlemoyer Luke
Publication venue
Publication date: 01/01/2016
Field of study

Texts present coherent stories that have a particular theme or overall setting, for example science fiction or western. In this paper, we present a text generation method called {\it rewriting} that edits existing human-authored narratives to change their theme without changing the underlying story. We apply the approach to math word problems, where it might help students stay more engaged by quickly transforming all of their homework assignments to the theme of their favorite movie without changing the math concepts that are being taught. Our rewriting method uses a two-stage decoding process, which proposes new words from the target theme and scores the resulting stories according to a number of factors defining aspects of syntactic, semantic, and thematic coherence. Experiments demonstrate that the final stories typically represent the new theme well while still testing the original math concepts, outperforming a number of baselines. We also release a new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201

arXiv.org e-Print Archive

ParaPhraser: Russian paraphrase corpus and shared task

Author: Pivovarova Lidia
Pronoza Anton
Pronoza Ekaterina
Yagunova Elena
Publication venue: Springer
Publication date: 01/01/2017
Field of study

The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.Peer reviewe

Helsingin yliopiston digitaalinen arkisto