Search CORE

6,845 research outputs found

Evaluasi Butir Soal Pilihan Ganda Penilaian Tengah Semester dalam Pembelajaran Tematik untuk Kelas V di SDN Gladak Anyar 4 Pamekasan

Author: Kasanova Ria
Sulistiyono Roni
Publication venue: Departement of Mathematics Education
Publication date: 28/06/2023
Field of study

This research was conducted with the aim of evaluating the validity, level of difficulty, discriminating power, effectiveness of the distractor, and reliability of multiple choice questions in the Mid Semester Assessment in fifth grade thematic learning at SDN Gladak Anyar 4 Pamekasan. The research method used is a quantitative descriptive approach. There are two themes in the Mid Semester Examination Questions, namely theme 6 with 20 questions and theme 7 with 19 questions. Evaluation of the validity of the questions, level of difficulty, discriminating power, effectiveness of the distractor, and reliability was carried out using Microsoft Excel 2010. The subjects of this study were fifth grade students, and data collection was carried out using documentation techniques. The results of this study indicate that the quality of the questions is high. (1) The validity of the questions in theme 6 were 19 questions (95%) and in theme 7 were 18 questions (94.74%) declared valid. (2) The difficulty level of the questions in theme 6 consisted of 13 questions (68.42%) which were categorized as easy and 2 questions (10.53%) which were categorized as difficult. In theme 7, there are 11 questions (61.11%) which are categorized as easy, so there are questions with difficulty levels that do not meet good quality. (3) The discriminating power of questions on theme 6 consisted of 10 items (52.63%) which were categorized as poor and 1 item (5.26%) which were categorized as good. In theme 7, there are 6 items (33.33%) which are categorized as not good and 3 items (16.67%) which are categorized as good. Therefore, the questions fall into the category of moderate discriminating power. (4) The effectiveness of the distractor in theme 6 consisted of 1 item (5.26%) which was categorized as very good, 8 items (42.11%) which were categorized as good, and 6 items (31.58%) which were categorized as poor. In theme 7, there were 4 items (22.22%) which were categorized as very good, 4 items (22.22%) which were categorized as good, and 3 items (16.67%) which were categorized as poor. Thus, the questions fall into the category of the effectiveness of a good distractor. (5) The reliability of the questions in theme 6 is 0.9592, while in theme 7 it is 0.8950, indicating that the questions have high reliability and high quality

Journal On Education (Faculty of Education University of Pahlawan Tuanku Tambusai)

Pengembangan Alat Evaluasi Pembelajaran Matematika Berbasis Two Tier Multiple Choice Menggunakan Ispring Suite 9

Author: Huda Syaiful
Rovita Chusnul Amalia
Zawawi Irwani
Publication venue: 'Universitas Muhammadiyah Gresik'
Publication date: 06/12/2020
Field of study

In the era of the industrial revolution 4.0 and the pandemic era, innovation is needed in thedevelopment of technology-based evaluation tools. One of them is the ispring suite 9. Evaluation tool isimportant because it can help the process of evaluating educators to find out information on theachievement of results during the learning process. In addition to learning outcomes, evaluation can alsodetermine the ability of students to understand concepts. One of them is a two tier multiple choiceevaluation test. Two tier multiple choice is a form of two-tier multiple choice evaluation test. The purposeof this study is to develop a learning evaluation tool that can find out students' understanding of conceptsand can be used online. The research and development model used is 4D. The research instruments usedwere interview sheets, validation sheets, test instruments, and questionnaires. Data analysis techniquesused qualitative and quantitative. The result of this study is a two-tier multiple choice based mathematicsevaluation tool using ispring suite 9 which is seen from: (1) the percentage of validation results frommedia experts is 90.5% and material experts is 96.5%, both of which fall into the very feasible category.(2) the quality of the items seen from the validity obtained 8 valid items with a reliability of 0.815.Judging from the level of difficulty, the percentage is 10% difficult, 80% moderate and 10% easy, thedistinguishing power is obtained 5 questions are included in the good category, 3 questions are quitegood, 1 question is very good, and 1 question is bad, and the effectiveness of the distractor has 9distractors who selected > 5% of all students. (3) The percentage of students 'conceptual understandingafter evaluating two tier multiple choices is 50.5%, which is in the sufficient category, and the results ofthe students' responses are 82% which is included in the very interesting category

Open Journal Systems of University of Muhammadiyah Gresik

Crowdsourcing Multiple Choice Science Questions

Author: Gardner Matt
Liu Nelson F.
Welbl Johannes
Publication venue
Publication date: 01/01/2017
Field of study

We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.Comment: accepted for the Workshop on Noisy User-generated Text (W-NUT) 201

arXiv.org e-Print Archive

Crossref

STARC: Structured Annotations for Reading Comprehension

Author: Berzak Yevgeni
Levy Roger
Malmaud Jonathan
Publication venue
Publication date: 01/01/2020
Field of study

We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework for assessing reading comprehension with multiple choice questions. Our framework introduces a principled structure for the answer choices and ties them to textual span annotations. The framework is implemented in OneStopQA, a new high-quality dataset for evaluation and analysis of reading comprehension in English. We use this dataset to demonstrate that STARC can be leveraged for a key new application for the development of SAT-like reading comprehension materials: automatic annotation quality probing via span ablation experiments. We further show that it enables in-depth analyses and comparisons between machine and human reading comprehension behavior, including error distributions and guessing ability. Our experiments also reveal that the standard multiple choice dataset in NLP, RACE, is limited in its ability to measure reading comprehension. 47% of its questions can be guessed by machines without accessing the passage, and 18% are unanimously judged by humans as not having a unique correct answer. OneStopQA provides an alternative test set for reading comprehension which alleviates these shortcomings and has a substantially higher human ceiling performance.Comment: ACL 2020. OneStopQA dataset, STARC guidelines and human experiments data are available at https://github.com/berzak/onestop-q

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education

Author: A Van Alphen
B Chopin
BA Fenderson
BD Wright
C Hagquist
CD Kreiter
D Andrich
D Andrich
D Andrich
DL Streiner
G Karabastos
G Rasch
G Rasch
GE Miller
GE Stone
General Medical Council
J Dobby
J Kehoe
J Umar
JC Impara
JM Bland
M Banerji
M Kane
R Hambleton
RD Luce
RF Burton
RK Hambelton
RM Smith
S Alagumalai
SM Case
SM Case
TA Van Batenburg
V Wass
W Wang
WH Angoff
WJ Popham
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

BACKGROUND: As assessment has been shown to direct learning, it is critical that the examinations developed to test clinical competence in medical undergraduates are valid and reliable. The use of extended matching questions (EMQ) has been advocated to overcome some of the criticisms of using multiple-choice questions to test factual and applied knowledge. METHODS: We analysed the results from the Extended Matching Questions Examination taken by 4th year undergraduate medical students in the academic year 2001 to 2002. Rasch analysis was used to examine whether the set of questions used in the examination mapped on to a unidimensional scale, the degree of difficulty of questions within and between the various medical and surgical specialties and the pattern of responses within individual questions to assess the impact of the distractor options. RESULTS: Analysis of a subset of items and of the full examination demonstrated internal construct validity and the absence of bias on the majority of questions. Three main patterns of response selection were identified. CONCLUSION: Modern psychometric methods based upon the work of Rasch provide a useful approach to the calibration and analysis of EMQ undergraduate medical assessments. The approach allows for a formal test of the unidimensionality of the questions and thus the validity of the summed score. Given the metric calibration which follows fit to the model, it also allows for the establishment of items banks to facilitate continuity and equity in exam standards

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Research Repository

White Rose Research Online

Learning to Reuse Distractors to support Multiple Choice Question Generation in Education

Author: Bitew Semere Kiros
Deleu Johannes
Demeester Thomas
Develder Chris
Hadifar Amir
Sterckx Lucas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.Comment: 24 pages and 4 figures Accepted for publication in IEEE Transactions on Learning technologie

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Recommended from our members

Item statistics derived from three-option versions of multiple-choice questions are usually as robust as four- or five-option versions: implications for exam design.

Author: Loudon Catherine
Macias-Muñoz Aide
Publication venue: eScholarship, University of California
Publication date: 01/12/2018
Field of study

Different versions of multiple-choice exams were administered to an undergraduate class in human physiology as part of normal testing in the classroom. The goal was to evaluate whether the number of options (possible answers) per question influenced the effectiveness of this assessment. Three exams (each with three versions) were given to each of two sections during an academic quarter. All versions were equally long, with 30 questions: 10 questions with 3 options, 10 questions with 4, and 10 questions with 5 (always one correct answer plus distractors). Each question appeared in all three versions of an exam, with a different number of options in each version (three, four, or five). Discrimination (point biserial and upper-lower discrimination indexes) and difficulty were evaluated for each question. There was a small increase in difficulty (a lower average score on a question) when more options were provided. The upper-lower discrimination index indicated a small improvement in assessment of student learning with more options, although the point biserial did not. The total length of a question (number of words) was associated with a small increase in discrimination and difficulty, independent of the number of options. Quantitative questions were more likely to show an increase in discrimination with more options than nonquantitative questions, but this effect was very small. Therefore, for these testing conditions, there appears to be little advantage in providing more than three options per multiple-choice question, and there are disadvantages, such as needing more time for an exam

eScholarship - University of California