2,070 research outputs found
Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems
Crowdsourcing markets have emerged as a popular platform for matching
available workers with tasks to complete. The payment for a particular task is
typically set by the task's requester, and may be adjusted based on the quality
of the completed work, for example, through the use of "bonus" payments. In
this paper, we study the requester's problem of dynamically adjusting
quality-contingent payments for tasks. We consider a multi-round version of the
well-known principal-agent model, whereby in each round a worker makes a
strategic choice of the effort level which is not directly observable by the
requester. In particular, our formulation significantly generalizes the
budget-free online task pricing problems studied in prior work.
We treat this problem as a multi-armed bandit problem, with each "arm"
representing a potential contract. To cope with the large (and in fact,
infinite) number of arms, we propose a new algorithm, AgnosticZooming, which
discretizes the contract space into a finite number of regions, effectively
treating each region as a single arm. This discretization is adaptively
refined, so that more promising regions of the contract space are eventually
discretized more finely. We analyze this algorithm, showing that it achieves
regret sublinear in the time horizon and substantially improves over
non-adaptive discretization (which is the only competing approach in the
literature).
Our results advance the state of art on several different topics: the theory
of crowdsourcing markets, principal-agent problems, multi-armed bandits, and
dynamic pricing.Comment: This is the full version of a paper in the ACM Conference on
Economics and Computation (ACM-EC), 201
Revisiting MAB based approaches to recursive delegation
In this paper we examine the effectiveness of several multi-arm bandit
algorithms when used as a trust system to select agents to delegate tasks to.
In contrast to existing work, we allow for recursive delegation to occur. That
is, a task delegated to one agent can be delegated onwards by that agent, with
further delegation possible until some agent finally executes the task. We show
that modifications to the standard multi-arm bandit algorithms can provide
improvements in performance in such recursive delegation settings
Learning User Preferences to Incentivize Exploration in the Sharing Economy
We study platforms in the sharing economy and discuss the need for
incentivizing users to explore options that otherwise would not be chosen. For
instance, rental platforms such as Airbnb typically rely on customer reviews to
provide users with relevant information about different options. Yet, often a
large fraction of options does not have any reviews available. Such options are
frequently neglected as viable choices, and in turn are unlikely to be
evaluated, creating a vicious cycle. Platforms can engage users to deviate from
their preferred choice by offering monetary incentives for choosing a different
option instead. To efficiently learn the optimal incentives to offer, we
consider structural information in user preferences and introduce a novel
algorithm - Coordinated Online Learning (CoOL) - for learning with structural
information modeled as convex constraints. We provide formal guarantees on the
performance of our algorithm and test the viability of our approach in a user
study with data of apartments on Airbnb. Our findings suggest that our approach
is well-suited to learn appropriate incentives and increase exploration on the
investigated platform.Comment: Longer version of AAAI'18 paper. arXiv admin note: text overlap with
arXiv:1702.0284
Incentive mechanism design for citizen reporting application using Stackelberg game
The growing utilization of smartphones equipped with various sensors to collect and analyze information around us highlights a paradigm called mobile crowdsensing. To motivate citizens’ participation in crowdsensing and compensate them for their resources, it is necessary to incentivize the participants for their sensing service. There are several studies that used the Stackelberg game to model the incentive mechanism, however, those studies did not include a budget constraint for limited budget case. Another challenge is to optimize crowdsourcer (government) profit in conducting crowdsensing under the limited budget then allocates the budget to several regional working units that are responsible for the specific city problems. We propose an incentive mechanism for mobile crowdsensing based on several identified incentive parameters using the Stackelberg game model and applied the MOOP (multi-objective optimization problem) to the incentive model in which the participant reputation is taken into account. The evaluation of the proposed incentive model is performed through simulations. The simulation indicated that the result appropriately corresponds to the theoretical properties of the model
Competing Bandits: The Perils of Exploration Under Competition
Most online platforms strive to learn from interactions with users, and many
engage in exploration: making potentially suboptimal choices for the sake of
acquiring new information. We study the interplay between exploration and
competition: how such platforms balance the exploration for learning and the
competition for users. Here users play three distinct roles: they are customers
that generate revenue, they are sources of data for learning, and they are
self-interested agents which choose among the competing platforms.
We consider a stylized duopoly model in which two firms face the same
multi-armed bandit problem. Users arrive one by one and choose between the two
firms, so that each firm makes progress on its bandit problem only if it is
chosen. Through a mix of theoretical results and numerical simulations, we
study whether and to what extent competition incentivizes the adoption of
better bandit algorithms, and whether it leads to welfare increases for users.
We find that stark competition induces firms to commit to a "greedy" bandit
algorithm that leads to low welfare. However, weakening competition by
providing firms with some "free" users incentivizes better exploration
strategies and increases welfare. We investigate two channels for weakening the
competition: relaxing the rationality of users and giving one firm a
first-mover advantage. Our findings are closely related to the "competition vs.
innovation" relationship, and elucidate the first-mover advantage in the
digital economy.Comment: merged and extended version of arXiv:1702.08533 and arXiv:1902.0559
Dynamic project selection
We study a normative model of an internal capital market that a company uses to choose between its two divisions’ projects. Each project’s value is initially unknown to all, but can be dynamically learned by the corresponding division. Learning can be suspended or resumed at any time and is costly. We characterize an internal capital market that maximizes the company’s expected cash flow
Ballooning Multi-Armed Bandits
In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a novel
extension of the classical stochastic MAB model. In the BL-MAB model, the set
of available arms grows (or balloons) over time. In contrast to the classical
MAB setting where the regret is computed with respect to the best arm overall,
the regret in a BL-MAB setting is computed with respect to the best available
arm at each time. We first observe that the existing stochastic MAB algorithms
result in linear regret for the BL-MAB model. We prove that, if the best arm is
equally likely to arrive at any time instant, a sub-linear regret cannot be
achieved. Next, we show that if the best arm is more likely to arrive in the
early rounds, one can achieve sub-linear regret. Our proposed algorithm
determines (1) the fraction of the time horizon for which the newly arriving
arms should be explored and (2) the sequence of arm pulls in the exploitation
phase from among the explored arms. Making reasonable assumptions on the
arrival distribution of the best arm in terms of the thinness of the
distribution's tail, we prove that the proposed algorithm achieves sub-linear
instance-independent regret. We further quantify explicit dependence of regret
on the arrival distribution parameters. We reinforce our theoretical findings
with extensive simulation results. We conclude by showing that our algorithm
would achieve sub-linear regret even if (a) the distributional parameters are
not exactly known, but are obtained using a reasonable learning mechanism or
(b) the best arm is not more likely to arrive early, but a large fraction of
arms is likely to arrive relatively early.Comment: A full version of this paper is accepted in the Journal of Artificial
Intelligence (AIJ) of Elsevier. A preliminary version is published as an
extended abstract in AAMAS 2020. Proceedings of the 19th International
Conference on Autonomous Agents and MultiAgent Systems. 202
- …