Search CORE

255 research outputs found

Active Learning from Imperfect Labelers

Author: Chaudhuri Kamalika
Javidi Tara
Yan Songbai
Publication venue
Publication date: 01/01/2016
Field of study

We study active learning where the labeler can not only return incorrect labels but also abstain from labeling. We consider different noise and abstention conditions of the labeler. We propose an algorithm which utilizes abstention responses, and analyze its statistical consistency and query complexity under fairly natural assumptions on the noise and abstention rate of the labeler. This algorithm is adaptive in a sense that it can automatically request less queries with a more informed or less noisy labeler. We couple our algorithm with lower bounds to show that under some technical conditions, it achieves nearly optimal query complexity.Comment: To appear in NIPS 201

arXiv.org e-Print Archive

eScholarship - University of California

A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

Author: Pichara Karim
Protopapas Pavlos
Saldias Belen
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/08/2019
Field of study

Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official pages, 5 supplementary page

arXiv.org e-Print Archive

Crossref

Repeated Labeling Using Multiple Noisy Labelers

Author: Ipeirotis Panagiotis G.
Provost Foster
Sheng Victor
Wang Jing
Publication venue
Publication date: 10/09/2010
Field of study

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a set of robust techniques that combine different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire. For certain label-quality/cost regimes, the benefit is substantial.This work was supported by the National Science Foundation under Grant No. IIS-0643846, by an NSERC Postdoctoral Fellowship, and by an NEC Faculty Fellowship

New York University Faculty Digital Archive

Active Learning with Noisy Labelers for Improving Classification Accuracy of Connected Vehicles

Author: Aiman Erbad
Alaa Awad Abdellatif
Amr Mohamed
Carla Fabiana Chiasserini
Francesco Malandrino
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. Reacting to such situations requires accurate classification for uncommon events, which in turn depends on the selection of large, diverse, and high-quality training data. In fact, the data available at a vehicle (e.g., photos of road signs) may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. Given the information received from neighboring vehicles, our solution: (i) selects which vehicles can reliably generate high-quality training data, and (ii) obtains a reliable subset of data to add to the training set by trading off between two essential features, i.e., quality and diversity. The results, obtained with different real-world datasets, demonstrate that our framework significantly outperforms state-of-the-art solutions, providing high classification accuracy with a limited bandwidth requirement for the data exchange between vehicles

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)