4 research outputs found
Measurement Integrity in Peer Prediction: A Peer Assessment Case Study
We propose measurement integrity, a property related to ex post reward
fairness, as a novel desideratum for peer prediction mechanisms in many natural
applications. Like robustness against strategic reporting, the property that
has been the primary focus of the peer prediction literature, measurement
integrity is an important consideration for understanding the practical
performance of peer prediction mechanisms. We perform computational
experiments, both with an agent-based model and with real data, to empirically
evaluate peer prediction mechanisms according to both of these important
properties. Our evaluations simulate the application of peer prediction
mechanisms to peer assessment -- a setting in which ex post fairness concerns
are particularly salient. We find that peer prediction mechanisms, as proposed
in the literature, largely fail to demonstrate significant measurement
integrity in our experiments. We also find that theoretical properties
concerning robustness against strategic reporting are somewhat noisy predictors
of empirical performance. Further, there is an apparent trade-off between our
two dimensions of analysis. The best-performing mechanisms in terms of
measurement integrity are highly susceptible to strategic reporting.
Ultimately, however, we show that supplementing mechanisms with realistic
parametric statistical models can, in some cases, improve performance along
both dimensions of our analysis and result in mechanisms that strike the best
balance between them.Comment: The code for our experiments is hosted in the following GitHub
repository:
https://github.com/burrelln/Measurement-Integrity-and-Peer-Assessment.
Version 2 (uploaded on 9/22/22) introduces experiments with real peer grading
data alongside significant changes to the framing of the paper and
presentation of the result
Model konsep integriti ke arah peningkatan kualiti penilaian rakan (peer assessment)
mereka dalam sesuatu tugasan. Namun, penilaian rakan ini kurang diamalkan di institusi pengajian tinggi kerana kualiti penilaian ini masih diragui terutamanya dari aspek integriti pelajar sebagai penilai. Sehubungan itu, kajian ini mencadangkan satu model integriti ke arah peningkatan kualiti penilaian rakan. Kajian ini dijalankan menggunakan reka bentuk multiphase yang terdiri daripada tiga (3) fasa. Pada fasa I (Analisis dokumen dan temu bual), dokumen daripada tahun 2010 hingga 2018 telah digunakan dan temu bual daripada enam (6) pakar dalam bidang Pendidikan Teknikal dan Vokasional (PTV) telah memperoleh tiga (3) elemen integriti iaitu, Integriti Diri (Motivasi diri, keberanian, disiplin diri, dan ketelusan), Interaksi Sosial (Kejujuran, keadilan, konsisten, amanah, dan perpaduan), dan Komitmen Kerja (Usaha, tanggungjawab, dan etika). Bagi fasa II (Pembangunan instrumen) pula, teknik Modified Delphi (MD) digunakan bagi memperoleh konsensus daripada pakar mengenai item-item yang dibina. Hasil daripada persetujuan pakar MD terdapat 90 item digunakan dalam kajian rintis I dan dianalisis menggunakan Winsteps; dan 19 item telah disingkirkan. Seterusnya, Kajian rintis II dijalankan bagi tujuan mendapatkan kesahan dan kebolehpercayaan item yang digunakan menggunakan Exploratory Factor Analysis (EFA). Hasil analisis EFA mendapati 3 item bertindih dan disingkirkan bagi menjalankan fasa akhir. Pada Fasa III (Pembangunan model), sebanyak 543 soal selidik telah dianalisis dengan menggunakan Structural Equation Modelling (SEM-AMOS). Hasil Miximum Likelihood Estimates menunjukkan nilai C.R > ± 1.96 bagi pekali regresi antara integriti dan kualiti penilaian adalah positif dan signifikan (β = 0.85, C.R = 12.558, p < 0.001). Ini menggambarkan bahawa integriti mempengaruhi kualiti penilaian. Manakala, kesediaan memperuntukkan masa merupakan partial mediator (rc= 0.23) yang sederhana penting kepada integriti dan kualiti penilaian rakan secara tidak langsung. Namun, jantina merupakan full moderator kepada kesan integriti terhadap kualiti penilaian rakan kerana pelajar perempuan mendapatkan nilai Δχ2 = 16.754 lebih besar daripada 3.84 berbanding dengan lelaki iaitu nilai Δχ2 = 3.218. Model integriti boleh dijadikan sebagai satu model panduan yang digunakan oleh pensyarah bagi mengaplikasikan penilaian rakan secara sistematik dan meramalkan kemungkinan yang akan berlaku semasa aktiviti penilaian ini dijalankan
On Connections Between Machine Learning And Information Elicitation, Choice Modeling, And Theoretical Computer Science
Machine learning, which has its origins at the intersection of computer science and statistics, is now a rapidly growing area of research that is being integrated into almost every discipline in science and business such as economics, marketing and information retrieval. As a consequence of this integration, it is necessary to understand how machine learning interacts with these disciplines and to understand fundamental questions that arise at the resulting interfaces. The goal of my thesis research is to study these interdisciplinary questions at the interface of machine learning and other disciplines including mechanism design/information elicitation, preference/choice modeling, and theoretical computer science
Recommended from our members
Practical Peer Prediction for Peer Assessment
We provide an empirical analysis of peer prediction mechanisms, which reward participants for information in settings when there is no ground truth against which to score reports. We simulate the mechanisms on a dataset of three million peer assessments from the edX MOOC platform. We evaluate different mechanisms on score variability, which is connected to fairness, risk aversion, and participant learning. We also assess the magnitude of the incentives to invest effort, and study the effect of participant coordination on low-information signals. We find that the correlated agreement mechanism has lower variation in reward than other mechanisms. A concern is that the gain from exerting effort is relatively low across all mechanisms, due to frequent disagreement between peers. Our conclusions are relevant for crowdsourcing in education as well as other domains.Other Research Uni