1,160 research outputs found
Crowdsourcing Multiple Choice Science Questions
We present a novel method for obtaining high-quality, domain-targeted
multiple choice questions from crowd workers. Generating these questions can be
difficult without trading away originality, relevance or diversity in the
answer options. Our method addresses these problems by leveraging a large
corpus of domain-specific text and a small set of existing questions. It
produces model suggestions for document selection and answer distractor choice
which aid the human question generation process. With this method we have
assembled SciQ, a dataset of 13.7K multiple choice science exam questions
(Dataset available at http://allenai.org/data.html). We demonstrate that the
method produces in-domain questions by providing an analysis of this new
dataset and by showing that humans cannot distinguish the crowdsourced
questions from original questions. When using SciQ as additional training data
to existing questions, we observe accuracy improvements on real science exams.Comment: accepted for the Workshop on Noisy User-generated Text (W-NUT) 201
Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions
In this paper, we propose a novel configurable framework to automatically
generate distractive choices for open-domain cloze-style multiple-choice
questions, which incorporates a general-purpose knowledge base to effectively
create a small distractor candidate set, and a feature-rich learning-to-rank
model to select distractors that are both plausible and reliable. Experimental
results on datasets across four domains show that our framework yields
distractors that are more plausible and reliable than previous methods. This
dataset can also be used as a benchmark for distractor generation in the
future.Comment: To appear at AAAI 202
Let's play with proverbs? NLP tools and resources for iCALL applications around proverbs for PFL
Proverbs are an important form of cultural expression of a society and are related to various areas of
knowledge and human experience (González Rey, 2002). While linguistic elements in widespread
use, proverbs are very rich structures both from a cultural and from a linguistic point of view and can
therefore contribute significantly to the teaching of languages, both native and foreign (Council of
Europe, 2001). However, though there are extensive collections of Portuguese proverbs with tens of
thousands of forms and its variants (Reis, in preparation), its automatic identification in texts is quite
difficult, given its formal variation, both lexical and syntactic (Chacoto, 1994). Nevertheless, using
real examples, where proverbs are used in a natural or spontaneous discourse context, is a more natural
way to learn and teach the complex conditions and communicative situations that determine the
use and meaning of these expressions. On the other hand, frequency indices associated with proverbs
and its variants would allow one to select the most common expressions. These are precisely the
most interesting forms from the point of view of their teaching/learning and could serve as a basis for
the construction of educational games, particularly for learning Portuguese autonomously as a foreign
language (PFL) assisted by computer. To make this possible, it is necessary, first of all, be able
to recognize the occurrence of proverbs in the texts (Rassi et al. 2014), including the instances where
these expressions are presented in a truncated or creatively modified form, for example, to better suit
the communicative situation or to produce new and more expressive meanings. In this paper, we present
an on-going project, which aims at automatic identification of proverbs in texts. In this interdisciplinary
study, we combine natural language processing tools with questionnaires construction
techniques for teaching purposes (Hoshino and Nakagawa 2005, Correia et al. 2010). This is illustrated
here with different sets of formats that can be built based on the knowledge of the form and
variation of proverbs, as well as their frequency in corpora.info:eu-repo/semantics/publishedVersio
A Survey of Natural Language Generation
This paper offers a comprehensive review of the research on Natural Language
Generation (NLG) over the past two decades, especially in relation to
data-to-text generation and text-to-text generation deep learning methods, as
well as new applications of NLG technology. This survey aims to (a) give the
latest synthesis of deep learning research on the NLG core tasks, as well as
the architectures adopted in the field; (b) detail meticulously and
comprehensively various NLG tasks and datasets, and draw attention to the
challenges in NLG evaluation, focusing on different evaluation methods and
their relationships; (c) highlight some future emphasis and relatively recent
research issues that arise due to the increasing synergy between NLG and other
artificial intelligence areas, such as computer vision, text and computational
creativity.Comment: Accepted by ACM Computing Survey (CSUR) 202
- …