14 research outputs found
Selective Sampling with Drift
Recently there has been much work on selective sampling, an online active
learning setting, in which algorithms work in rounds. On each round an
algorithm receives an input and makes a prediction. Then, it can decide whether
to query a label, and if so to update its model, otherwise the input is
discarded. Most of this work is focused on the stationary case, where it is
assumed that there is a fixed target model, and the performance of the
algorithm is compared to a fixed model. However, in many real-world
applications, such as spam prediction, the best target function may drift over
time, or have shifts from time to time. We develop a novel selective sampling
algorithm for the drifting setting, analyze it under no assumptions on the
mechanism generating the sequence of instances, and derive new mistake bounds
that depend on the amount of drift in the problem. Simulations on synthetic and
real-world datasets demonstrate the superiority of our algorithms as a
selective sampling algorithm in the drifting setting
Online Clustering of Bandits
We introduce a novel algorithmic approach to content recommendation based on
adaptive clustering of exploration-exploitation ("bandit") strategies. We
provide a sharp regret analysis of this algorithm in a standard stochastic
noise setting, demonstrate its scalability properties, and prove its
effectiveness on a number of artificial and real-world datasets. Our
experiments show a significant increase in prediction performance over
state-of-the-art methods for bandit problems.Comment: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International
Conference on Machine Learning, Journal of Machine Learning Research Workshop
and Conference Proceedings, Vol.32 (JMLR W&CP-32), Beijing, China, Jun.
21-26, 2014 (ICML 2014), Submitted by Shuai Li
(https://sites.google.com/site/shuailidotsli
Wisely Using a Budget for Crowdsourcing
The problem of “approximating the crowd” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense,” that works in an online fashion where examples come one at a time. Crowd-Sense dynamically samples subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers’ votes that approximates the crowd’s opinion. We then introduce two variations of CrowdSense that make various distributional assumptions to handle distinct crowd characteristics. In particular, the first algorithm makes a statistical independence assumption of the probabilities for large crowds, whereas the second algorithm finds a lower bound on how often the current sub-crowd agrees with the crowd majority vote. Our experiments on CrowdSense and several baselines demonstrate that we can reliably approximate the entire crowd’s vote by collecting opinions from a representative subset of the crowd