7 research outputs found
An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance
Consider a requester who wishes to crowdsource a series of identical binary
labeling tasks to a pool of workers so as to achieve an assured accuracy for
each task, in a cost optimal way. The workers are heterogeneous with unknown
but fixed qualities and their costs are private. The problem is to select for
each task an optimal subset of workers so that the outcome obtained from the
selected workers guarantees a target accuracy level. The problem is a
challenging one even in a non strategic setting since the accuracy of
aggregated label depends on unknown qualities. We develop a novel multi-armed
bandit (MAB) mechanism for solving this problem. First, we propose a framework,
Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained
Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound
on the number of time steps the algorithm chooses a sub-optimal set that
depends on the target accuracy level and true qualities. A more challenging
situation arises when the requester not only has to learn the qualities of the
workers but also elicit their true costs. We modify the CCB-NS algorithm to
obtain an adaptive exploration separated algorithm which we call { \em
Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm
produces an ex-post monotone allocation rule and thus can be transformed into
an ex-post incentive compatible and ex-post individually rational mechanism
that learns the qualities of the workers and guarantees a given target accuracy
level in a cost optimal way. We provide a lower bound on the number of times
any algorithm should select a sub-optimal set and we see that the lower bound
matches our upper bound upto a constant factor. We provide insights on the
practical implementation of this framework through an illustrative example and
we show the efficacy of our algorithms through simulations
Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards
In practice, incentive providers (i.e., principals) often cannot observe the
reward realizations of incentivized agents, which is in contrast to many
principal-agent models that have been previously studied. This information
asymmetry challenges the principal to consistently estimate the agent's unknown
rewards by solely watching the agent's decisions, which becomes even more
challenging when the agent has to learn its own rewards. This complex setting
is observed in various real-life scenarios ranging from renewable energy
storage contracts to personalized healthcare incentives. Hence, it offers not
only interesting theoretical questions but also wide practical relevance. This
paper explores a repeated adverse selection game between a self-interested
learning agent and a learning principal. The agent tackles a multi-armed bandit
(MAB) problem to maximize their expected reward plus incentive. On top of the
agent's learning, the principal trains a parallel algorithm and faces a
trade-off between consistently estimating the agent's unknown rewards and
maximizing their own utility by offering adaptive incentives to lead the agent.
For a non-parametric model, we introduce an estimator whose only input is the
history of principal's incentives and agent's choices. We unite this estimator
with a proposed data-driven incentive policy within a MAB framework. Without
restricting the type of the agent's algorithm, we prove finite-sample
consistency of the estimator and a rigorous regret bound for the principal by
considering the sequential externality imposed by the agent. Lastly, our
theoretical results are reinforced by simulations justifying applicability of
our framework to green energy aggregator contracts.Comment: 72 pages, 6 figures. arXiv admin note: text overlap with
arXiv:2304.0740
An optimal bidimensional multi-armed bandit auction for multi-unit procurement
We study the problem of a buyer who gains stochastic rewards by procuring through an auction, multiple units of a service or item from a pool of heterogeneous agents who are strategic on two dimensions, namely cost and capacity. The reward obtained for a single unit from an allocated agent depends on the inherent quality of the agent; the agent's quality is fixed but unknown. Each agent can only supply a limited number of units (capacity of the agent). The cost incurred per unit and capacity (maximum number of units that can be supplied) are private information of each agent. The auctioneer is required to elicit from the agents their costs as well as capacities (making the mechanism design bidimensional) and further, learn the qualities of the agents as well, with a view to maximize her utility. Motivated by this, we design a bidimensional multi-armed bandit procurement auction that seeks to maximize the expected utility of the auctioneer subject to incentive compatibility and individual rationality, while simultaneously learning the unknown qualities of the agents. We first work with the assumption that the qualities are known, and propose an optimal, truthful mechanism 2D-OPT for the auctioneer to elicit costs and capacities. Next, in order to learn the qualities of the agents as well, we provide sufficient conditions for a learning algorithm to be Bayesian incentive compatible and individually rational. We finally design a novel learning mechanism, 2D-UCB that is stochastic Bayesian incentive compatible and individually rational
CIMODE 2016: 3º Congresso Internacional de Moda e Design: proceedings
O CIMODE 2016 é o terceiro Congresso Internacional de Moda e Design, a decorrer de 9 a 12 de maio de 2016 na cidade de Buenos Aires, subordinado ao tema : EM--‐TRAMAS. A presente edição é organizada pela Faculdade de Arquitetura, Desenho e Urbanismo da Universidade de Buenos Aires, em conjunto com o Departamento de Engenharia Têxtil da Universidade do Minho e com a ABEPEM – Associação Brasileira de Estudos e Pesquisa em Moda.info:eu-repo/semantics/publishedVersio