Mechanical Turk-based Experiment vs Laboratory-based Experiment: A Case Study on the Comparison of Semantic Transparency Rating Data

Abstract

In this paper, we conducted semantic trans-parency rating experiments using both the traditional laboratory-based method and the crowdsourcing-based method. Then we com-pared the rating data obtained from these two experiments. We observed very strong cor-relation coefficients for both overall seman-tic transparency rating data and constituent se-mantic transparency data (rho> 0:9) which means the two experiments may yield com-parable data and crowdsourcing-based experi-ment is a feasible alternative to the laboratory-based experiment in linguistic studies. We also observed a scale shrinkage phenomenon in both experiments: the actual scale of the rat-ing results cannot cover the ideal scale [0; 1], both ends of the actual scale shrink towards the center. However, the scale shrinkage of the crowdsourcing-based experiment is stronger than that of the laboratory-based experiment, this makes the rating results obtained in these two experiments not directly comparable. In order to make the results directly compara-ble, we explored two data transformation al-gorithms, z-score transformation and adjusted normalization to unify the scales. We also in-vestigated the uncertainty of semantic trans-parency judgment among raters, we found that it had a regular relation with semantic trans-parency magnitude and this may further reveal a general cognitivemechanism of human judg-ment.

    Similar works