783 research outputs found
Semi-verified PAC Learning from the Crowd with Pairwise Comparisons
We study the problem of crowdsourced PAC learning of threshold functions with
pairwise comparisons. This is a challenging problem and only recently have
query-efficient algorithms been established in the scenario where the majority
of the crowd are perfect. In this work, we investigate the significantly more
challenging case that the majority are incorrect, which in general renders
learning impossible. We show that under the semi-verified model of
Charikar~et~al.~(2017), where we have (limited) access to a trusted oracle who
always returns the correct annotation, it is possible to PAC learn the
underlying hypothesis class while drastically mitigating the labeling cost via
the more easily obtained comparison queries. Orthogonal to recent developments
in semi-verified or list-decodable learning that crucially rely on data
distributional assumptions, our PAC guarantee holds by exploring the wisdom of
the crowd.Comment: v2 incorporates a simpler Filter algorithm, thus the technical
assumption (in v1) is no longer needed. v2 also reorganizes and emphasizes
new algorithm component
SALIC: Social Active Learning for Image Classification
In this paper, we present SALIC, an active learning method for selecting the most appropriate user tagged images to expand the training set of a binary classifier. The process of active learning can be fully automated in this social context by replacing the human oracle with the images' tags. However, their noisy nature adds further complexity to the sample selection process since, apart from the images' informativeness (i.e., how much they are expected to inform the classifier if we knew their label), our confidence about their actual label should also be maximized (i.e., how certain the oracle is on the images' true contents). The main contribution of this work is in proposing a probabilistic approach for jointly maximizing the two aforementioned quantities. In the examined noisy context, the oracle's confidence is necessary to provide a contextual-based indication of the images' true contents, while the samples' informativeness is required to reduce the computational complexity and minimize the mistakes of the unreliable oracle. To prove this, first, we show that SALIC allows us to select training data as effectively as typical active learning, without the cost of manual annotation. Finally, we argue that the speed-up achieved when learning actively in this social context (where labels can be obtained without the cost of human annotation) is necessary to cope with the continuously growing requirements of large-scale applications. In this respect, we demonstrate that SALIC requires ten times less training data in order to reach the same performance as a straightforward informativeness-agnostic learning approach
- …