2,850 research outputs found
An Open source Implementation of ITU-T Recommendation P.808 with Validation
The ITU-T Recommendation P.808 provides a crowdsourcing approach for
conducting a subjective assessment of speech quality using the Absolute
Category Rating (ACR) method. We provide an open-source implementation of the
ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended
our implementation to include Degradation Category Ratings (DCR) and Comparison
Category Ratings (CCR) test methods. We also significantly speed up the test
process by integrating the participant qualification step into the main rating
task compared to a two-stage qualification and rating solution. We provide
program scripts for creating and executing the subjective test, and data
cleansing and analyzing the answers to avoid operational errors. To validate
the implementation, we compare the Mean Opinion Scores (MOS) collected through
our implementation with MOS values from a standard laboratory experiment
conducted based on the ITU-T Rec. P.800. We also evaluate the reproducibility
of the result of the subjective speech quality assessment through crowdsourcing
using our implementation. Finally, we quantify the impact of parts of the
system designed to improve the reliability: environmental tests, gold and
trapping questions, rating patterns, and a headset usage test
A study into annotation ranking metrics in geo-tagged image corpora
Community contributed datasets are becoming increasingly common in automated image annotation systems. One important issue with community image data is that there is no guarantee that the associated metadata is relevant. A method is required that can accurately rank the semantic relevance of community annotations. This should enable the extracting of relevant subsets from potentially noisy collections of these annotations. Having relevant, non heterogeneous tags assigned to images should improve community image retrieval systems, such as Flickr, which are based on text retrieval methods. In the literature, the current state of the art approach to ranking the semantic relevance of Flickr tags is based on the widely used tf-idf metric. In the case of datasets containing landmark images, however, this metric is inefficient due to the high frequency of common landmark tags within the data set and can be improved upon. In this paper, we present a landmark recognition framework, that provides end-to-end automated recognition and annotation. In our study into automated annotation, we evaluate 5 alternate approaches to tf-idf
to rank tag relevance in community contributed landmark image corpora. We carry out a thorough evaluation of each of these ranking metrics and results of this evaluation demonstrate that four of these proposed techniques outperform the current commonly-used tf-idf approach for this task
- …