8,577 research outputs found
Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus
The ability to ask questions is important in both human and machine
intelligence. Learning to ask questions helps knowledge acquisition, improves
question-answering and machine reading comprehension tasks, and helps a chatbot
to keep the conversation flowing with a human. Existing question generation
models are ineffective at generating a large amount of high-quality
question-answer pairs from unstructured text, since given an answer and an
input passage, question generation is inherently a one-to-many mapping. In this
paper, we propose Answer-Clue-Style-aware Question Generation (ACS-QG), which
aims at automatically generating high-quality and diverse question-answer pairs
from unlabeled text corpus at scale by imitating the way a human asks
questions. Our system consists of: i) an information extractor, which samples
from the text multiple types of assistive information to guide question
generation; ii) neural question generators, which generate diverse and
controllable questions, leveraging the extracted assistive information; and
iii) a neural quality controller, which removes low-quality generated data
based on text entailment. We compare our question generation models with
existing approaches and resort to voluntary human evaluation to assess the
quality of the generated question-answer pairs. The evaluation results suggest
that our system dramatically outperforms state-of-the-art neural question
generation models in terms of the generation quality, while being scalable in
the meantime. With models trained on a relatively smaller amount of data, we
can generate 2.8 million quality-assured question-answer pairs from a million
sentences found in Wikipedia.Comment: Accepted by The Web Conference 2020 (WWW 2020) as full paper (oral
presentation
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension
Automatic text simplification (TS) aims to automate the process of rewriting
text to make it easier for people to read. A pre-requisite for TS to be useful
is that it should convey information that is consistent with the meaning of the
original text. However, current TS evaluation protocols assess system outputs
for simplicity and meaning preservation without regard for the document context
in which output sentences occur and for how people understand them. In this
work, we introduce a human evaluation framework to assess whether simplified
texts preserve meaning using reading comprehension questions. With this
framework, we conduct a thorough human evaluation of texts by humans and by
nine automatic systems. Supervised systems that leverage pre-training knowledge
achieve the highest scores on the reading comprehension (RC) tasks amongst the
automatic controllable TS systems. However, even the best-performing supervised
system struggles with at least 14% of the questions, marking them as
"unanswerable'' based on simplified content. We further investigate how
existing TS evaluation metrics and automatic question-answering systems
approximate the human judgments we obtained.Comment: Accepted at TACL (a pre-MIT Press publication version
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models
Reading comprehension tests are used in a variety of applications, reaching from education to assessing the comprehensibility of simplified texts. However, creating such tests manually and ensuring their quality is difficult and time-consuming. In this paper, we explore how large language models (LLMs) can be used to generate and evaluate multiple-choice reading comprehension items. To this end, we compiled a dataset of German reading comprehension items and developed a new protocol for human and automatic evaluation, including a metric we call text informativity, which is based on guessability and answerability. We then used this protocol and the dataset to evaluate the quality of items generated by Llama 2 and GPT-4. Our results suggest that both models are capable of generating items of acceptable quality in a zero-shot setting, but GPT-4 clearly outperforms Llama 2. We also show that LLMs can be used for automatic evaluation by eliciting item reponses from them. In this scenario, evaluation results with GPT-4 were the most similar to human annotators. Overall, zero-shot generation with LLMs is a promising approach for generating and evaluating reading comprehension test items, in particular for languages without large amounts of available data
Instructional strategies and tactics for the design of introductory computer programming courses in high school
This article offers an examination of instructional strategies and tactics for the design of introductory computer programming courses in high school. We distinguish the Expert, Spiral and Reading approach as groups of instructional strategies that mainly differ in their general design plan to control students' processing load. In order, they emphasize topdown program design, incremental learning, and program modification and amplification. In contrast, tactics are specific design plans that prescribe methods to reach desired learning outcomes under given circumstances. Based on ACT* (Anderson, 1983) and relevant research, we distinguish between declarative and procedural instruction and present six tactics which can be used both to design courses and to evaluate strategies. Three tactics for declarative instruction involve concrete computer models, programming plans and design diagrams; three tactics for procedural instruction involve worked-out examples, practice of basic cognitive skills and task variation. In our evaluation of groups of instructional strategies, the Reading approach has been found to be superior to the Expert and Spiral approaches
Automatic generation of audio content for open learning resources
This paper describes how digital talking books (DTBs) with embedded functionality for learners can be generated from content structured according to the OU OpenLearn schema. It includes examples showing how a software transformation developed from open source components can be used to remix OpenLearn content, and discusses issues concerning the generation of synthesised speech for educational purposes. Factors which may affect the quality of a learner's experience with open educational audio resources are identified, and in conclusion plans for testing the effect of these factors are outlined
Diversity Enhanced Narrative Question Generation for Storybooks
Question generation (QG) from a given context can enhance comprehension,
engagement, assessment, and overall efficacy in learning or conversational
environments. Despite recent advancements in QG, the challenge of enhancing or
measuring the diversity of generated questions often remains unaddressed. In
this paper, we introduce a multi-question generation model (mQG), which is
capable of generating multiple, diverse, and answerable questions by focusing
on context and questions. To validate the answerability of the generated
questions, we employ a SQuAD2.0 fine-tuned question answering model,
classifying the questions as answerable or not. We train and evaluate mQG on
the FairytaleQA dataset, a well-structured QA dataset based on storybooks, with
narrative questions. We further apply a zero-shot adaptation on the TellMeWhy
and SQuAD1.1 datasets. mQG shows promising results across various evaluation
metrics, among strong baselines.Comment: Accepted to EMNLP 202
Contrastive Learning for Inference in Dialogue
Inference, especially those derived from inductive processes, is a crucial
component in our conversation to complement the information implicitly or
explicitly conveyed by a speaker. While recent large language models show
remarkable advances in inference tasks, their performance in inductive
reasoning, where not all information is present in the context, is far behind
deductive reasoning. In this paper, we analyze the behavior of the models based
on the task difficulty defined by the semantic information gap -- which
distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993).
Our analysis reveals that the disparity in information between dialogue
contexts and desired inferences poses a significant challenge to the inductive
inference process. To mitigate this information gap, we investigate a
contrastive learning approach by feeding negative samples. Our experiments
suggest negative samples help models understand what is wrong and improve their
inference generations.Comment: Accepted to EMNLP202
Recommended from our members
Persisting Preschoolers: Using Storybooks to Increase Persistence on Difficult Tasks
Persistence is a critical component of problem-solving and is predictive of academic achievement. Despite the crucial importance of fostering persistence during early childhood, most researchers have developed interventions for school settings (e.g., elementary and middle school) rather than formulating strategies and tools to increase persistence for early education settings (e.g., preschools and nursery centers). This dissertation investigates whether and how storybooks can be used to increase preschoolers’ perseverance (assessed via time spent attempting to complete challenging tasks). The researcher-developed books in this study demonstrate how sustained effort towards a difficult goal and the use of multiple problem-solving strategies are essential to goal-achievement despite moments of setback or failure (struggle-stories). To evaluate the effectiveness of the intervention on general perseverance, the amount of time spent on two transfer tasks (puzzle & search and find) was used to measure persistence. Findings did not detect statistically significant differences in persistence between children who simply heard struggle-stories and those who heard non-struggle narratives. Given a reading-only intervention is quite subtle for this young age group, this dissertation also explores two additional strategies used to complement the struggle-stories: roleplaying and praise. Results indicate child-led roleplaying after reading the struggle-stories was not an effective approach; however, children who heard researchers praise characters throughout each reading of the struggle narratives demonstrated statistically significant greater persistence on the transfer tasks. The implications for struggle-story development and the use of additional strategies to increase persistence at home or in the classroom are discussed
- …