246 research outputs found
On Kernelized Multi-armed Bandits
We consider the stochastic bandit problem with a continuous set of arms, with
the expected reward function over the arms assumed to be fixed but unknown. We
provide two new Gaussian process-based algorithms for continuous bandit
optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and
derive corresponding regret bounds. Specifically, the bounds hold when the
expected reward function belongs to the reproducing kernel Hilbert space (RKHS)
that naturally corresponds to a Gaussian process kernel used as input by the
algorithms. Along the way, we derive a new self-normalized concentration
inequality for vector- valued martingales of arbitrary, possibly infinite,
dimension. Finally, experimental evaluation and comparisons to existing
algorithms on synthetic and real-world environments are carried out that
highlight the favorable gains of the proposed strategies in many cases
Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems
Crowdsourcing markets have emerged as a popular platform for matching
available workers with tasks to complete. The payment for a particular task is
typically set by the task's requester, and may be adjusted based on the quality
of the completed work, for example, through the use of "bonus" payments. In
this paper, we study the requester's problem of dynamically adjusting
quality-contingent payments for tasks. We consider a multi-round version of the
well-known principal-agent model, whereby in each round a worker makes a
strategic choice of the effort level which is not directly observable by the
requester. In particular, our formulation significantly generalizes the
budget-free online task pricing problems studied in prior work.
We treat this problem as a multi-armed bandit problem, with each "arm"
representing a potential contract. To cope with the large (and in fact,
infinite) number of arms, we propose a new algorithm, AgnosticZooming, which
discretizes the contract space into a finite number of regions, effectively
treating each region as a single arm. This discretization is adaptively
refined, so that more promising regions of the contract space are eventually
discretized more finely. We analyze this algorithm, showing that it achieves
regret sublinear in the time horizon and substantially improves over
non-adaptive discretization (which is the only competing approach in the
literature).
Our results advance the state of art on several different topics: the theory
of crowdsourcing markets, principal-agent problems, multi-armed bandits, and
dynamic pricing.Comment: This is the full version of a paper in the ACM Conference on
Economics and Computation (ACM-EC), 201
- …