Multiple Choice Questions (MCQs) have long been the backbone of standardized
testing in academia and industry. Correspondingly, there is a constant need for the
authors of MCQs to write and refine new questions for new versions of standardized
tests as well as to support measuring performance in the emerging massive open online
courses, (MOOCs). Research that explores what makes a question difficult, or what
questions distinguish higher-performing students from lower-performing students can
aid in the creation of the next generation of teaching and evaluation tools.
In the automated MCQ answering component of this thesis, algorithms query for
definitions of scientific terms, process the returned web results, and compare the returned
definitions to the original definition in the MCQ. This automated method for
answering questions is then augmented with a model, based on human performance
data from crowdsourced question sets, for analysis of question difficulty as well as
the discrimination power of the non-answer alternatives. The crowdsourced question
sets come from PeerWise, an open source online college-level question authoring and
answering environment.
The goal of this research is to create an automated method to both answer and
assesses the difficulty of multiple choice inverse definition questions in the domain of
introductory biology. The results of this work suggest that human-authored question
banks provide useful data for building gold standard human performance models. The
methodology for building these performance models has value in other domains that
test the difficulty of questions and the quality of the exam takers