861 research outputs found

    Crowdsourcing Multiple Choice Science Questions

    Full text link
    We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.Comment: accepted for the Workshop on Noisy User-generated Text (W-NUT) 201

    Measure for Measure: A Critical Consumers' Guide to Reading Comprehension Assessments for Adolescents

    Get PDF
    A companion report to Carnegie's Time to Act, analyzes and rates commonly used reading comprehension tests for various elements and purposes. Outlines trends in types of questions, stress on critical thinking, and screening or diagnostic functions

    An Intelligent Approach to Automatic Query Formation from Plain Text using Artificial Intelligence

    Get PDF
    Man have always been, inherently, curious creatures. They ask questions in order to satiate their insatiable curiosity. For example, kids ask questions to learn more from their teachers, teachers ask questions to assist themselves to evaluate student performance, and we all ask questions in our daily lives. Numerous learning exchanges, ranging from one-on-one tutoring sessions to thorough exams, as well as real-life debates, rely heavily on questions. One notable fact is that, due to their inconsistency in particular contexts, humans are often inept at asking appropriate questions. It has been discovered that most people have difficulty identifying their own knowledge gaps. This becomes our primary motivator for automating question generation in the hopes that the benefits of an automated Question Generation (QG) system will help humans achieve their useful inquiry needs. QG and Information Extraction (IE) have become two major issues for language processing communities, and QG has recently become an important component of learning environments, systems, and information seeking systems, among other applications. The Text-to-Question generation job has piqued the interest of the Natural Language Processing (NLP), Natural Language Generation (NLG), Intelligent Tutoring System (ITS), and Information Retrieval (IR) groups as a possible option for the shared task. A text is submitted to a QG system in the Text-to-Question generation task. Its purpose would be to create a series of questions for which the text has answers (such as a word, a set of words, a single sentence, a text, a set of texts, a stretch of conversational dialogue, an inadequate query, and so on)

    The Task Matters: A Scoping Review on Reading Comprehension Abilities in ADHD

    Get PDF
    Objective: A broad range of tasks have been used to classify individuals with ADHD with reading comprehension difficulties. However, the inconsistency in the literature warrants a scoping review of current knowledge about the relationship between ADHD diagnosis and reading comprehension ability. Method: A comprehensive search strategy was performed to identify relevant articles on the topic. Thirty-four articles met inclusion criteria for the current review. Results: The evidence as a whole suggests reading comprehension is impaired in ADHD. The most prominent effect was found in studies where participants retell or pick out central ideas in stories. On these tasks, participants with ADHD performed consistently worse than typically developing controls. However, some studies found that performance in ADHD improved when reading comprehension task demands were low. Conclusion: Results suggest that performance in ADHD depends on the way reading comprehension is measured and further guide future work clarifying why there are such discrepant findings across studies

    The Cloze Procedure as a Testing and Teaching Device

    Get PDF

    Measures of comprehension for Czech first-to fourth-grade pupils

    Get PDF
    Ces résultats de recherche sont tirés du projet Compréhension de lecture- Développement typique et ses risques, dont le but principal est d’établir la dynamique des habiletés en compréhension de lecture chez les enfants tchèques. Dans cette publication, nous mettons l’emphase sur les problèmes des mesures de compréhension. Dû au manque de telles mesures en République Tchèque, l’équipe de recherche a conçu trois nouveaux outils: un test de lecture à haute voix (Rabbits), un test de lecture silencieuse (Going on a trip), et un test d’écoute (Little Star). Tous ces tests ont une structure identique et des caractéristiques similaires et évaluent la compréhension littérale et déductive d’une histoire. Premièrement, nous évaluons la fiabilité et la validité de ces tests en prenant les données d’une étude impliquant 467 élèves de la première à la quatrième année. Ensuite, nous comparons les résultats de compréhension (généraux, implicites et explicites) et étudions les différences entre les classes suivant les modèles de ces habiletés de compréhension. Nous discutons de la possibilité d’appliquer ces mesures comme outils de diagnostic en éducation et recherche.Abstract : The present findings are drawn from the project Reading comprehension – Typical Development and its Risks aiming to map the developmental dynamics of the reading comprehension skills of Czech children. In this paper, we focus on the issue of comprehension measures. Because of the lack of such measures in the Czech Republic, the research team designed three new tools: an oral reading comprehension (Rabbits), a silent reading comprehension (Going on a trip), and a listening comprehension test (Little Star). All of these tests have an identical structure, similar content features and assess the literal and inferential comprehension of a story. First, we report on the reliability and validity of these tests using data from a study involving 467 first-to fourth-graders. Second, we compare the comprehension scores (global, implicit, and explicit) and investigate the differences between grades in the patterns of these comprehension skills. We discuss the possibility of the application of these measures as diagnostic tools in education and research

    The reliability and validity of screening measures in reading

    Get PDF
    National educational groups have recommended the use of universal screening to assist in the early identification of reading problems. One of the most widely used measures used for the universal screening of reading is oral reading fluency (ORF) (Fewster & Macmillan, 2002). However, ORF is somewhat time consuming to administer and has been reported to lack “face validity” with teachers (Fuchs, Fuchs & Maxwell, 1988). The purpose of this study was to investigate maze and other group-administered reading assessments because of their potential as a time efficient assessment that is as psychometrically valid as ORF. In this study, maze and a variation of maze known as sentence maze, both group-administered measures of basic reading performance and comprehension, were studied. A third assessment, picture word fluency, which measures a combination of site word reading and simple vocabulary, was also evaluated. The study consisted of two experiments. In the first experiment, these assessments were evaluated based on their psychometric adequacy, as well as their utility and accuracy for decision-making in the context of the requirements for universal screening. The purpose of the second experiment was to examine the generality of the results to another state with different criterion measures. A total of 789 regular education first, third and fifth grade students in two states participated in the two experiments. Students were administered CBM assessments and a criterion achievement measure. Two groups of validity analyses were reported: (a) those pertaining to concurrent/predictive validity, and (b) those pertaining to classification accuracy. These analyses revealed validity estimates for the two maze assessments similar to those shown in previous research studies. Similarly, the validity analyses for picture word fluency were also promising. Most germane to the evaluation of the screening measures was the classification accuracy analyses. Although the results were somewhat variable by grade, the results indicated that there was a moderate to high degree of concordance between those students identified as at risk by the group-administered CBM measures and the criterion measures used in this study, including ORF and the state accountability tests. The limitations of the study are discussed with suggestions for future research
    • …
    corecore