666 research outputs found

    Introducing a framework to assess newly created questions with Natural Language Processing

    Full text link
    Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions by extracting meaningful features from the text of the question and of the possible choices. We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy, showing that it outperforms previously proposed models, reducing by 6.7% the RMSE for difficulty estimation and by 10.8% the RMSE for discrimination estimation. We also present the results of an ablation study performed to support our features choice and to show the effects of different characteristics of the questions' text on difficulty and discrimination.Comment: Accepted at the International Conference of Artificial Intelligence in Educatio

    Effect of Tuned Parameters on a LSA MCQ Answering Model

    Full text link
    This paper presents the current state of a work in progress, whose objective is to better understand the effects of factors that significantly influence the performance of Latent Semantic Analysis (LSA). A difficult task, which consists in answering (French) biology Multiple Choice Questions, is used to test the semantic properties of the truncated singular space and to study the relative influence of main parameters. A dedicated software has been designed to fine tune the LSA semantic space for the Multiple Choice Questions task. With optimal parameters, the performances of our simple model are quite surprisingly equal or superior to those of 7th and 8th grades students. This indicates that semantic spaces were quite good despite their low dimensions and the small sizes of training data sets. Besides, we present an original entropy global weighting of answers' terms of each question of the Multiple Choice Questions which was necessary to achieve the model's success.Comment: 9 page

    An Empirical Evaluation of Visual Question Answering for Novel Objects

    Full text link
    We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world-owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show that the performance of two popular existing methods drop significantly (up to 28%) when evaluated on novel objects cf. known objects. We propose methods which use large existing external corpora of (i) unlabeled text, i.e. books, and (ii) images tagged with classes, to achieve novel object based visual question answering. We do systematic empirical studies, for both an oracle case where the novel objects are known textually, as well as a fully automatic case without any explicit knowledge of the novel objects, but with the minimal assumption that the novel objects are semantically related to the existing objects in training. The proposed methods for novel object based visual question answering are modular and can potentially be used with many visual question answering architectures. We show consistent improvements with the two popular architectures and give qualitative analysis of the cases where the model does well and of those where it fails to bring improvements.Comment: 11 pages, 4 figures, accepted in CVPR 2017 (poster

    “I USE MULTIPLE-CHOICE QUESTION IN MOST ASSESSMENT I PREPARED”: EFL TEACHERS’ VOICE ON SUMMATIVE ASSESSMENT

    Get PDF
    The study aimed at investigating Senior High School English teachers’ views on the drawbacks and the strengths of the employement of Multicple Choice Question as a summative assessment. Rooting within qualitative research paradigm, the current study employed descriptive qualitative design. The data were collected through in-depth interview with three experienced EFL teachers of a prominent state senior high school in Banjar, West Java. The results of the interview indicated that there are three strengths in using Multiple-Choice Question (MCQ) as a summative assessment. These strengths included teachers’ view that MCQ could result in quick and easy scoring, facilitate the assessment of varied language skills and encourage the students to answer the question carefully. Additionally, there were three drwabacks in using MCQ as a summative assessment such as teachers’ view that MCQ could only facilitate on low order of critical thinking, have low positive washback and require a lot of time in its designing phase. Interestingly, two out of three participants thought that MCQ has been a mandatory type of summative assessment suggested by the government. However, in fact, there has been no government policy which recommend certain type of summative assessment. Therefore, looking at the strengths and drawbacks of the use of MCQ could help to be better informed before deciding to use MCQ as a summative assessment

    Learning to Reuse Distractors to support Multiple Choice Question Generation in Education

    Full text link
    Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.Comment: 24 pages and 4 figures Accepted for publication in IEEE Transactions on Learning technologie
    • …
    corecore