6,962 research outputs found
Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection
Linguistically diverse datasets are critical for training and evaluating
robust machine learning systems, but data collection is a costly process that
often requires experts. Crowdsourcing the process of paraphrase generation is
an effective means of expanding natural language datasets, but there has been
limited analysis of the trade-offs that arise when designing tasks. In this
paper, we present the first systematic study of the key factors in
crowdsourcing paraphrase collection. We consider variations in instructions,
incentives, data domains, and workflows. We manually analyzed paraphrases for
correctness, grammaticality, and linguistic diversity. Our observations provide
new insight into the trade-offs between accuracy and diversity in crowd
responses that arise as a result of task design, providing guidance for future
paraphrase generation procedures.Comment: Published at ACL 201
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
ParaPhraser: Russian paraphrase corpus and shared task
The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.Peer reviewe
- …