27 research outputs found
Dynamic Ad Allocation: Bandits with Budgets
We consider an application of multi-armed bandits to internet advertising
(specifically, to dynamic ad allocation in the pay-per-click model, with
uncertainty on the click probabilities). We focus on an important practical
issue that advertisers are constrained in how much money they can spend on
their ad campaigns. This issue has not been considered in the prior work on
bandit-based approaches for ad allocation, to the best of our knowledge.
We define a simple, stylized model where an algorithm picks one ad to display
in each round, and each ad has a \emph{budget}: the maximal amount of money
that can be spent on this ad. This model admits a natural variant of UCB1, a
well-known algorithm for multi-armed bandits with stochastic rewards. We derive
strong provable guarantees for this algorithm
Learning Prices for Repeated Auctions with Strategic Buyers
Inspired by real-time ad exchanges for online display advertising, we
consider the problem of inferring a buyer's value distribution for a good when
the buyer is repeatedly interacting with a seller through a posted-price
mechanism. We model the buyer as a strategic agent, whose goal is to maximize
her long-term surplus, and we are interested in mechanisms that maximize the
seller's long-term revenue. We define the natural notion of strategic regret
--- the lost revenue as measured against a truthful (non-strategic) buyer. We
present seller algorithms that are no-(strategic)-regret when the buyer
discounts her future surplus --- i.e. the buyer prefers showing advertisements
to users sooner rather than later. We also give a lower bound on strategic
regret that increases as the buyer's discounting weakens and shows, in
particular, that any seller algorithm will suffer linear strategic regret if
there is no discounting.Comment: Neural Information Processing Systems (NIPS 2013
Online learning in repeated auctions
Motivated by online advertising auctions, we consider repeated Vickrey
auctions where goods of unknown value are sold sequentially and bidders only
learn (potentially noisy) information about a good's value once it is
purchased. We adopt an online learning approach with bandit feedback to model
this problem and derive bidding strategies for two models: stochastic and
adversarial. In the stochastic model, the observed values of the goods are
random variables centered around the true value of the good. In this case,
logarithmic regret is achievable when competing against well behaved
adversaries. In the adversarial model, the goods need not be identical and we
simply compare our performance against that of the best fixed bid in hindsight.
We show that sublinear regret is also achievable in this case and prove
matching minimax lower bounds. To our knowledge, this is the first complete set
of strategies for bidders participating in auctions of this type
Algorithms as Mechanisms: The Price of Anarchy of Relax-and-Round
Many algorithms that are originally designed without explicitly considering
incentive properties are later combined with simple pricing rules and used as
mechanisms. The resulting mechanisms are often natural and simple to
understand. But how good are these algorithms as mechanisms? Truthful reporting
of valuations is typically not a dominant strategy (certainly not with a
pay-your-bid, first-price rule, but it is likely not a good strategy even with
a critical value, or second-price style rule either). Our goal is to show that
a wide class of approximation algorithms yields this way mechanisms with low
Price of Anarchy.
The seminal result of Lucier and Borodin [SODA 2010] shows that combining a
greedy algorithm that is an -approximation algorithm with a
pay-your-bid payment rule yields a mechanism whose Price of Anarchy is
. In this paper we significantly extend the class of algorithms for
which such a result is available by showing that this close connection between
approximation ratio on the one hand and Price of Anarchy on the other also
holds for the design principle of relaxation and rounding provided that the
relaxation is smooth and the rounding is oblivious.
We demonstrate the far-reaching consequences of our result by showing its
implications for sparse packing integer programs, such as multi-unit auctions
and generalized matching, for the maximum traveling salesman problem, for
combinatorial auctions, and for single source unsplittable flow problems. In
all these problems our approach leads to novel simple, near-optimal mechanisms
whose Price of Anarchy either matches or beats the performance guarantees of
known mechanisms.Comment: Extended abstract appeared in Proc. of 16th ACM Conference on
Economics and Computation (EC'15