'Association for the Advancement of Artificial Intelligence (AAAI)'
Abstract
Although supervised learning requires a labeled dataset, ob- taining labels from experts is generally expensive. For this reason, crowdsourcing services are attracting attention in the field of machine learning as a way to collect labels at rela- tively low cost. However, the labels obtained by crowdsourc- ing, i.e., from non-expert workers, are often noisy. A num- ber of methods have thus been devised for inferring true la- bels, and several methods have been proposed for learning classifiers directly from crowdsourced labels, referred to as learning from crowds. A more practical problem is learn- ing from crowdsourced labeled data and unlabeled data, i.e., semi-supervised learning from crowds. This paper presents a novel generative model of the labeling process in crowdsourc- ing. It leverages unlabeled data effectively by introducing latent featuresand a data distribution. Because the data distri- bution can be complicated, we use a deep neural network for the data distribution. Therefore, our model can be regarded as a kind of deep generative model. The problems caused by the intractability of latent variable posteriors is solved by intro- ducing an inference model. The experiments show that it out- performs four existing models, including a baseline model, on the MNIST dataset with simulated workers and the Rot- ten Tomatoes movie review dataset with Amazon Mechanical Turk workers