5 research outputs found

    How effective can simple ordinal peer grading be?

    Get PDF
    Ordinal peer grading has been proposed as a simple and scalable solution for computing reliable information about student performance in massive open online courses. The idea is to outsource the grading task to the students themselves as follows. After the end of an exam, each student is asked to rank — in terms of quality — a bundle of exam papers by fellow students. An aggregation rule then combines the individual rankings into a global one that contains all students. We define a broad class of simple aggregation rules, which we call type-ordering aggregation rules, and present a theoretical framework for assessing their effectiveness. When statistical information about the grading behaviour of students is available (in terms of a noise matrix that characterizes the grading behaviour of the average student from a student population), the framework can be used to compute the optimal rule from this class with respect to a series of performance objectives that compare the ranking returned by the aggregation rule to the underlying ground truth ranking. For example, a natural rule known as Borda is proved to be optimal when students grade correctly. In addition, we present extensive simulations that validate our theory and prove it to be extremely accurate in predicting the performance of aggregation rules even when only rough information about grading behaviour (i.e., an approximation of the noise matrix) is available. Both in the application of our theoretical framework and in our simulations, we exploit data about grading behaviour of students that have been extracted from two field experiments in the University of Patras

    How Effective Can Simple Ordinal Peer Grading Be?

    No full text

    PeerNomination : a novel peer selection algorithm to handle strategic and noisy assessments

    Get PDF
    In peer selection a group of agents must choose a subset of themselves, as winners for, e.g., peer-reviewed grants or prizes. We take a Condorcet view of this aggregation problem, assuming that there is an objective ground-truth ordering over the agents. We study agents that have a noisy perception of this ground truth and give assessments that, even when truthful, can be inaccurate. Our goal is to select the best set of agents according to the underlying ground truth by looking at the potentially unreliable assessments of the peers. Besides being potentially unreliable, we also allow agents to be self-interested, attempting to influence the outcome of the decision in their favour. Hence, we are focused on tackling the problem of impartial (or strategyproof) peer selection -- how do we prevent agents from manipulating their reviews while still selecting the most deserving individuals, all in the presence of noisy evaluations? We propose a novel impartial peer selection algorithm, PeerNomination, that aims to fulfil the above desiderata. We provide a comprehensive theoretical analysis of the recall of PeerNomination and prove various properties, including impartiality and monotonicity. We also provide empirical results based on computer simulations to show its effectiveness compared to the state-of-the-art impartial peer selection algorithms. We then investigate the robustness of PeerNomination to various levels of noise in the reviews. In order to maintain good performance under such conditions, we extend PeerNomination by using weights for reviewers which, informally, capture some notion of reliability of the reviewer. We show, theoretically, that the new algorithm preserves strategyproofness and, empirically, that the weights help identify the noisy reviewers and hence to increase selection performance

    Evaluating The Effectiveness of Live Peer Assessment as a Vehicle for The Development of Higher Order Practice in Computer Science Education

    Get PDF
    This thesis concerns a longitudinal study of the practice of Live Peer Assessment on two University courses in Computer Science. By Live Peer Assessment I mean a practice of whole-class collective marking using electronic devices of student artefacts demonstrated in a class or lecture theatre with instantaneous aggregated results displayed on screen immediately after each grading decision. This is radically different from historical peer-assessment in universities which has primarily been asynchronous process of marking of students’ work by small subsets of the cohort (e.g. 1 student artefact is marked by <3 fellow students). Live Peer Assessment takes place in public, is marked by (as far as practically possible) the whole cohort, and results are instantaneous. This study observes this practice, first on a level 4 course in E-Media Design where students’ main assignment is a multimedia CV (or resume) and secondly on a level 7 course in Multimedia Specification Design and Production where students produce a multimedia information artefact in both prototype and final versions. In both cases, students learned about these assignments from reviewing works done by previous students in Live Peer Evaluation events where they were asked to collectively publicly mark those works according to the same rubrics that the tutors would be using. In this level 4 course, this was used to help students get a better understanding of the marks criteria. In the level 7 course, this goal was also pursued, but was also used for the peer marking of students’ own work. Among the major findings of this study are: In the level 4 course student attainment in the final assessment improved on average by 13% over 4 iterations of the course, with very marked increase among students in the lower percentiles The effectiveness of Live Peer Assessment in improving student work comes from o Raising the profile of the marking rubric o Establishing a repertoire of example work o Modelling the “noticing” of salient features (of quality or defect) enabling students to self-monitor more effectively In the major accepted measure of peer-assessment reliability (correlation between student awarded marks and tutor awarded marks) Live Peer Assessment is superior to traditional peer assessment. That is to say, students mark more like tutors when using Live Peer Assessment In the second major measure (effect-size) which calculates if students are more strict or generous than tutors, (where the ideal would be no difference), Live Peer Assessment is broadly comparable with traditional peer assessment but this is susceptible to the conditions under which it takes place The reason for the better greater alignment of student and tutor marks comes from the training sessions but also from the public nature of the marking where individuals can compare their marking practice with that of the rest of the class on a criterion by criterion basis New measures proposed in this thesis to measure the health of peer assessment events comprise: Krippendorf’s Alpha, Magin’s Reciprocity Matrix, the median pairwise tutor student marks correlation, the Skewness and Kurtosis of the distribution of pairwise tutor student marking correlations Recommendations for practice comprise that: o summative peer assessment should not take place under conditions of anonymity but that very light conditions of marking competence should be enforced on student markers (e.g. >0.2 correlation between individual student marking and that of tutors) o That rubrics can be more suggestive and colloquial in the conditions of Live Peer Assessment because the marking criteria can be instantiated in specific examples of student attainment and therefore the criteria may be less legalistically drafted because a more holistic understanding of quality can be communicate
    corecore