Search CORE

1,473 research outputs found

Comparing knowledge sources for nominal anaphora resolution

Author: Markert K.
Nissim M.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2005
Field of study

We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora by means of shallow lexico-semantic patterns. As corpora we use the British National Corpus (BNC), as well as the Web, which has not been previously used for this task. Our results show that (a) the knowledge encoded in WordNet is often insufficient, especially for anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite NP coreference, the Web-based method yields results comparable to those obtained using WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the dataset; (d) in both case studies, the BNC-based method is worse than the other methods because of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge gap often encountered in anaphora resolution, and handled examples with context-dependent relations between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

White Rose Research Online

Crowdsourcing Multiple Choice Science Questions

Author: Gardner Matt
Liu Nelson F.
Welbl Johannes
Publication venue
Publication date: 01/01/2017
Field of study

We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.Comment: accepted for the Workshop on Noisy User-generated Text (W-NUT) 201

arXiv.org e-Print Archive

Crossref

Boundaries of Semantic Distraction: Dominance and Lexicality Act at Retrieval

Author: A Buchner
A Buchner
A Buchner
A Hantsch
AD Baddeley
AM Mood
AP Smith
C Hulme
C Maidhof
CB Mervis
CB Neely
CJP Oswald
CN Cofer
CP Beaman
D Broadbent
DJ Burns
DM Jones
DM Jones
DM Jones
DM Jones
DM Jones
Dylan M. Jones
E Sundstrom
E Tulving
EM Elliott
F Frankel
FBR Parmentier
G Underwood
G Underwood
G Underwood
GH Bower
GW Evans
I Neath
J Saint-Aubin
JE Marsh
JE Marsh
JE Marsh
JE Marsh
JE Marsh
JE Marsh
JE Marsh
JH Neely
JH Neely
John E. Marsh
JP Overschelde Van
L Tan
M Allen
M Miozzo
M Moscovitch
MC Anderson
MD Murphy
MW Mulligan
N Cowan
NE Wetherick
Nick Perham
NJ Slamecka
O Neumann
P Sörqvist
P Sörqvist
P Sörqvist
PA Tun
Patrik Sörqvist
R Bell
R Bell
R Bell
RC Martin
RL Hudson
RR Hunt
RW Hughes
RW Hughes
RW Hughes
RW Hughes
S Hygge
S Tremblay
SD Gronlund
SE Gathercole
SM Sheffert
T Witterseh
TJ Shuell
TJ Shuell
WA Bousfield
WA Bousfield
WA Wallis
WJ Macken
Y Zhang
YB Sirotin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2014
Field of study

Three experiments investigated memory for semantic information with the goal of determining boundary conditions for the manifestation of semantic auditory distraction. Irrelevant speech disrupted the free recall of semantic category-exemplars to an equal degree regardless of whether the speech coincided with presentation or test phases of the task (Experiment 1) and occurred regardless of whether it comprised random words or coherent sentences (Experiment 2). The effects of background speech were greater when the irrelevant speech was semantically related to the to-be-remembered material, but only when the irrelevant words were high in output dominance (Experiment 3). The implications of these findings in relation to the processing of task material and the processing of background speech is discussed

CLoK

Crossref

Online Research @ Cardiff

Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension

Author: Luo Senlin
Wu Yunfang
Zhou Xiaorui
Publication venue
Publication date: 19/11/2019
Field of study

In reading comprehension, generating sentence-level distractors is a significant task, which requires a deep understanding of the article and question. The traditional entity-centered methods can only generate word-level or phrase-level distractors. Although recently proposed neural-based methods like sequence-to-sequence (Seq2Seq) model show great potential in generating creative text, the previous neural methods for distractor generation ignore two important aspects. First, they didn't model the interactions between the article and question, making the generated distractors tend to be too general or not relevant to question context. Second, they didn't emphasize the relationship between the distractor and article, making the generated distractors not semantically relevant to the article and thus fail to form a set of meaningful options. To solve the first problem, we propose a co-attention enhanced hierarchical architecture to better capture the interactions between the article and question, thus guide the decoder to generate more coherent distractors. To alleviate the second problem, we add an additional semantic similarity loss to push the generated distractors more relevant to the article. Experimental results show that our model outperforms several strong baselines on automatic metrics, achieving state-of-the-art performance. Further human evaluation indicates that our generated distractors are more coherent and more educative compared with those distractors generated by baselines.Comment: 8 pages, 3 figures. Accepted by AAAI202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

InDEX: Indonesian Idiom and Expression Dataset for Cloze Test

Author: Qiu Xinying
Shi Guofeng
Publication venue
Publication date: 23/11/2022
Field of study

We propose InDEX, an Indonesian Idiom and Expression dataset for cloze test. The dataset contains 10438 unique sentences for 289 idioms and expressions for which we generate 15 different types of distractors, resulting in a large cloze-style corpus. Many baseline models of cloze test reading comprehension apply BERT with random initialization to learn embedding representation. But idioms and fixed expressions are different such that the literal meaning of the phrases may or may not be consistent with their contextual meaning. Therefore, we explore different ways to combine static and contextual representations for a stronger baseline model. Experimentations show that combining definition and random initialization will better support cloze test model performance for idioms whether independently or mixed with fixed expressions. While for fixed expressions with no special meaning, static embedding with random initialization is sufficient for cloze test model.Comment: Accepted to "2022 International Conference on Asian Language Processing (IALP)

arXiv.org e-Print Archive

Automatic Distractor Generation for Multiple Choice Questions in Standard Tests

Author: Fan Wei
Qiu Zhaopeng
Wu Xian
Publication venue
Publication date: 01/01/2020
Field of study

To assess the knowledge proficiency of a learner, multiple choice question is an efficient and widespread form in standard tests. However, the composition of the multiple choice question, especially the construction of distractors is quite challenging. The distractors are required to both incorrect and plausible enough to confuse the learners who did not master the knowledge. Currently, the distractors are generated by domain experts which are both expensive and time-consuming. This urges the emergence of automatic distractor generation, which can benefit various standard tests in a wide range of domains. In this paper, we propose a question and answer guided distractor generation (EDGE) framework to automate distractor generation. EDGE consists of three major modules: (1) the Reforming Question Module and the Reforming Passage Module apply gate layers to guarantee the inherent incorrectness of the generated distractors; (2) the Distractor Generator Module applies attention mechanism to control the level of plausibility. Experimental results on a large-scale public dataset demonstrate that our model significantly outperforms existing models and achieves a new state-of-the-art.Comment: accepted by COLING202

arXiv.org e-Print Archive

Crossref

Automatic Question Generation for the Portuguese Language

Author: Bernardo José Coelho Leite
Publication venue
Publication date: 20/07/2020
Field of study

Repositório Aberto da Universidade do Porto