The ITU-T Recommendation P.808 provides a crowdsourcing approach for
conducting a subjective assessment of speech quality using the Absolute
Category Rating (ACR) method. We provide an open-source implementation of the
ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended
our implementation to include Degradation Category Ratings (DCR) and Comparison
Category Ratings (CCR) test methods. We also significantly speed up the test
process by integrating the participant qualification step into the main rating
task compared to a two-stage qualification and rating solution. We provide
program scripts for creating and executing the subjective test, and data
cleansing and analyzing the answers to avoid operational errors. To validate
the implementation, we compare the Mean Opinion Scores (MOS) collected through
our implementation with MOS values from a standard laboratory experiment
conducted based on the ITU-T Rec. P.800. We also evaluate the reproducibility
of the result of the subjective speech quality assessment through crowdsourcing
using our implementation. Finally, we quantify the impact of parts of the
system designed to improve the reliability: environmental tests, gold and
trapping questions, rating patterns, and a headset usage test