81,729 research outputs found

    Active Learning With Complementary Sampling for Instructing Class-Biased Multi-Label Text Emotion Classification

    Get PDF
    High-quality corpora have been very scarce for the text emotion research. Existing corpora with multi-label emotion annotations have been either too small or too class-biased to properly support a supervised emotion learning. In this paper, we propose a novel active learning method for efficiently instructing the human annotations for a less-biased and high-quality multi-label emotion corpus. Specifically, to compensate annotation for the minority-class examples, we propose a complementary sampling strategy based on unlabeled resources by measuring a probabilistic distance between the expected emotion label distribution in a temporary corpus and an uniform distribution. Qualitative evaluations are also given to the unlabeled examples, in which we evaluate the model uncertainties for multi-label emotion predictions, their syntactic representativeness for the other unlabeled examples, and their diverseness to the labeled examples, for a high-quality sampling. Through active learning, a supervised emotion classifier gets progressively improved by learning from these new examples. Experiment results suggest that by following these sampling strategies we can develop a corpus of high-quality examples with significantly relieved bias for emotion classes. Compared to the learning procedures based on traditional active learning algorithms, our learning procedure indicates the most efficient learning curve and estimates the best multi-label emotion predictions

    A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

    Full text link
    Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official pages, 5 supplementary page
    • …
    corecore