63,645 research outputs found

    Crowdsourcing Multiple Choice Science Questions

    Full text link
    We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.Comment: accepted for the Workshop on Noisy User-generated Text (W-NUT) 201

    Automatic generation of audio content for open learning resources

    Get PDF
    This paper describes how digital talking books (DTBs) with embedded functionality for learners can be generated from content structured according to the OU OpenLearn schema. It includes examples showing how a software transformation developed from open source components can be used to remix OpenLearn content, and discusses issues concerning the generation of synthesised speech for educational purposes. Factors which may affect the quality of a learner's experience with open educational audio resources are identified, and in conclusion plans for testing the effect of these factors are outlined

    A misleading answer generation system for exam questions

    Get PDF
    University professors are responsible for teaching and grading their students in each semester. Normally, in order to evaluate the students progress, professors create exams that are composed of questions regarding the subjects taught in the teaching period. Each year, professors need to develop new questions for their exams since students are free to discuss and register the correct answers to the various questions on prior exams. Professors want to be able to grade students based on their knowledge and not on their memorization skills. Each year, as discovered by our research, professors spend over roughtly 2:30 hours each year for a single course only on multiple answer questions sections. This solution will have at its core a misleading answer generator that would reduce the time and effort when creating a Fill Gap Type Questions through the merger of highly biased lexical model towards a specific subject with a generalist model. To help the most amount of professors with this task a web-server was implemented that served as an access to a exam creator interface with the misleading answer generator feature. To implement the misleading answer generator feature, several accessory programs had to be created as well as manually edditing textbooks pertaining to the question base topic. To evaluate the effectiveness of our implementation, several evaluation methods were proposed composed of objective measurements of the misleading answers generator, as well as subjective methods of evaluation by expert input. The development of the misleading answer suggestion function required us to build a lexical model composed from a highly biased corpus in a specific curricular subject. A highly biased model is probable to give good in-context misleading answers but their variance would most likely be limited. To counteract this the model was merged with a generalist model, in hopes of improving its overall performance. With the development of the custom lexical model and the server the professor can receive misleading answers suggestions to a newly formed question reducing the time spent on creating new exams questions each year to assess students’ knowledge

    RACE: Large-scale ReAding Comprehension Dataset From Examinations

    Full text link
    We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students' ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines.Comment: EMNLP 201
    corecore