3 research outputs found

    Machine learning from crowds using candidate set-based labelling

    Get PDF
    Crowdsourcing is a popular cheap alternative in machine learning for gathering information from a set of annotators. Learning from crowd-labelled data involves dealing with its inherent uncertainty and inconsistencies. In the classical framework, each annotator provides a single label per example, which fails to capture the complete knowledge of annotators. We propose candidate labelling, that is, to allow annotators to provide a set of candidate labels for each example and thus express their doubts. We propose an appropriate model for the annotators, and present two novel learning methods that deal with the two basic steps (label aggregation and model learning) sequentially or jointly. Our empirical study shows the advantage of candidate labelling and the proposed methods with respect to the classical framework

    On the use of the descriptive variable for enhancing the aggregation of crowdsourced labels

    Get PDF
    The use of crowdsourcing for annotating data has become a popular and cheap alternative to expert labelling. As a consequence, an aggregation task is required to combine the different labels provided and agree on a single one per example. Most aggregation techniques, including the simple and robust majority voting¿to select the label with the largest number of votes¿disregard the descriptive information provided by the explanatory variable. In this paper, we propose domain-aware voting, an extension of majority voting which incorporates the descriptive variable and the rest of the instances of the dataset for aggregating the label of every instance. The experimental results with simulated and real-world crowdsourced data suggest that domain-aware voting is a competitive alternative to majority voting, especially when a part of the dataset is unlabelled. We elaborate on practical criteria for the use of domain-aware voting
    corecore