This dissertation argues that automated assessment systems can be useful for both students and educators provided that the results correspond well with human markers. Thus, evaluating such a system is crucial. I present an evaluation framework and show how and why it can be useful for both producers and consumers of automated assessment systems. The framework is a refinement of a research taxonomy that came out of the effort to analyse the literature review of systems based on Latent Semantic Analysis (LSA), a statistical natural language processing technique that has been used for automated assessment of essays. The evaluation framework can help developers publish their results in a format that is comprehensive, relatively compact, and useful to other researchers.\ud \ud The thesis claims that, in order to see a complete picture of an automated assessment system, certain pieces must be emphasised. It presents the framework as a jigsaw puzzle whose pieces join together to form the whole picture.\ud \ud The dissertation uses the framework to compare the accuracy of human markers and EMMA, the LSA-based assessment system I wrote as part of this dissertation. EMMA marks short, free text answers in the domain of computer science. I conducted a study of five human markers and then used the results as a benchmark against which to evaluate EMMA. An integral part of the evaluation was the success metric. The standard inter-rater reliability statistic was not useful; I located a new statistic and applied it to the domain of computer assisted assessment for the first time, as far as I know.\ud \ud Although EMMA exceeds human markers on a few questions, overall it does not achieve the same level of agreement with humans as humans do with each other. The last chapter maps out a plan for further research to improve EMMA
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.