772 research outputs found
Advancing Ad Auction Realism: Practical Insights & Modeling Implications
This paper proposes a learning model of online ad auctions that allows for
the following four key realistic characteristics of contemporary online
auctions: (1) ad slots can have different values and click-through rates
depending on users' search queries, (2) the number and identity of competing
advertisers are unobserved and change with each auction, (3) advertisers only
receive partial, aggregated feedback, and (4) payment rules are only partially
specified. We model advertisers as agents governed by an adversarial bandit
algorithm, independent of auction mechanism intricacies. Our objective is to
simulate the behavior of advertisers for counterfactual analysis, prediction,
and inference purposes. Our findings reveal that, in such richer environments,
"soft floors" can enhance key performance metrics even when bidders are drawn
from the same population. We further demonstrate how to infer advertiser value
distributions from observed bids, thereby affirming the practical efficacy of
our approach even in a more realistic auction setting
Truthful Learning Mechanisms for Multi-Slot Sponsored Search Auctions with Externalities
Sponsored search auctions constitute one of the most successful applications
of microeconomic mechanisms. In mechanism design, auctions are usually designed
to incentivize advertisers to bid their truthful valuations and to assure both
the advertisers and the auctioneer a non-negative utility. Nonetheless, in
sponsored search auctions, the click-through-rates (CTRs) of the advertisers
are often unknown to the auctioneer and thus standard truthful mechanisms
cannot be directly applied and must be paired with an effective learning
algorithm for the estimation of the CTRs. This introduces the critical problem
of designing a learning mechanism able to estimate the CTRs at the same time as
implementing a truthful mechanism with a revenue loss as small as possible
compared to an optimal mechanism designed with the true CTRs. Previous work
showed that, when dominant-strategy truthfulness is adopted, in single-slot
auctions the problem can be solved using suitable exploration-exploitation
mechanisms able to achieve a per-step regret (over the auctioneer's revenue) of
order (where T is the number of times the auction is repeated).
It is also known that, when truthfulness in expectation is adopted, a per-step
regret (over the social welfare) of order can be obtained. In
this paper we extend the results known in the literature to the case of
multi-slot auctions. In this case, a model of the user is needed to
characterize how the advertisers' valuations change over the slots. We adopt
the cascade model that is the most famous model in the literature for sponsored
search auctions. We prove a number of novel upper bounds and lower bounds both
on the auctioneer's revenue loss and social welfare w.r.t. to the VCG auction
and we report numerical simulations investigating the accuracy of the bounds in
predicting the dependency of the regret on the auction parameters
Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization
Data-driven algorithm design, that is, choosing the best algorithm for a
specific application, is a crucial problem in modern data science.
Practitioners often optimize over a parameterized algorithm family, tuning
parameters based on problems from their domain. These procedures have
historically come with no guarantees, though a recent line of work studies
algorithm selection from a theoretical perspective. We advance the foundations
of this field in several directions: we analyze online algorithm selection,
where problems arrive one-by-one and the goal is to minimize regret, and
private algorithm selection, where the goal is to find good parameters over a
set of problems without revealing sensitive information contained therein. We
study important algorithm families, including SDP-rounding schemes for problems
formulated as integer quadratic programs, and greedy techniques for canonical
subset selection problems. In these cases, the algorithm's performance is a
volatile and piecewise Lipschitz function of its parameters, since tweaking the
parameters can completely change the algorithm's behavior. We give a sufficient
and general condition, dispersion, defining a family of piecewise Lipschitz
functions that can be optimized online and privately, which includes the
functions measuring the performance of the algorithms we study. Intuitively, a
set of piecewise Lipschitz functions is dispersed if no small region contains
many of the functions' discontinuities. We present general techniques for
online and private optimization of the sum of dispersed piecewise Lipschitz
functions. We improve over the best-known regret bounds for a variety of
problems, prove regret bounds for problems not previously studied, and give
matching lower bounds. We also give matching upper and lower bounds on the
utility loss due to privacy. Moreover, we uncover dispersion in auction design
and pricing problems
An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance
Consider a requester who wishes to crowdsource a series of identical binary
labeling tasks to a pool of workers so as to achieve an assured accuracy for
each task, in a cost optimal way. The workers are heterogeneous with unknown
but fixed qualities and their costs are private. The problem is to select for
each task an optimal subset of workers so that the outcome obtained from the
selected workers guarantees a target accuracy level. The problem is a
challenging one even in a non strategic setting since the accuracy of
aggregated label depends on unknown qualities. We develop a novel multi-armed
bandit (MAB) mechanism for solving this problem. First, we propose a framework,
Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained
Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound
on the number of time steps the algorithm chooses a sub-optimal set that
depends on the target accuracy level and true qualities. A more challenging
situation arises when the requester not only has to learn the qualities of the
workers but also elicit their true costs. We modify the CCB-NS algorithm to
obtain an adaptive exploration separated algorithm which we call { \em
Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm
produces an ex-post monotone allocation rule and thus can be transformed into
an ex-post incentive compatible and ex-post individually rational mechanism
that learns the qualities of the workers and guarantees a given target accuracy
level in a cost optimal way. We provide a lower bound on the number of times
any algorithm should select a sub-optimal set and we see that the lower bound
matches our upper bound upto a constant factor. We provide insights on the
practical implementation of this framework through an illustrative example and
we show the efficacy of our algorithms through simulations
Optimal No-regret Learning in Repeated First-price Auctions
We study online learning in repeated first-price auctions with censored
feedback, where a bidder, only observing the winning bid at the end of each
auction, learns to adaptively bid in order to maximize her cumulative payoff.
To achieve this goal, the bidder faces a challenging dilemma: if she wins the
bid--the only way to achieve positive payoffs--then she is not able to observe
the highest bid of the other bidders, which we assume is iid drawn from an
unknown distribution. This dilemma, despite being reminiscent of the
exploration-exploitation trade-off in contextual bandits, cannot directly be
addressed by the existing UCB or Thompson sampling algorithms in that
literature, mainly because contrary to the standard bandits setting, when a
positive reward is obtained here, nothing about the environment can be learned.
In this paper, by exploiting the structural properties of first-price
auctions, we develop the first learning algorithm that achieves
regret bound when the bidder's private values are
stochastically generated. We do so by providing an algorithm on a general class
of problems, which we call monotone group contextual bandits, where the same
regret bound is established under stochastically generated contexts. Further,
by a novel lower bound argument, we characterize an lower
bound for the case where the contexts are adversarially generated, thus
highlighting the impact of the contexts generation mechanism on the fundamental
learning limit. Despite this, we further exploit the structure of first-price
auctions and develop a learning algorithm that operates sample-efficiently (and
computationally efficiently) in the presence of adversarially generated private
values. We establish an regret bound for this algorithm,
hence providing a complete characterization of optimal learning guarantees for
this problem
Selling to a No-Regret Buyer
We consider the problem of a single seller repeatedly selling a single item
to a single buyer (specifically, the buyer has a value drawn fresh from known
distribution in every round). Prior work assumes that the buyer is fully
rational and will perfectly reason about how their bids today affect the
seller's decisions tomorrow. In this work we initiate a different direction:
the buyer simply runs a no-regret learning algorithm over possible bids. We
provide a fairly complete characterization of optimal auctions for the seller
in this domain. Specifically:
- If the buyer bids according to EXP3 (or any "mean-based" learning
algorithm), then the seller can extract expected revenue arbitrarily close to
the expected welfare. This auction is independent of the buyer's valuation ,
but somewhat unnatural as it is sometimes in the buyer's interest to overbid. -
There exists a learning algorithm such that if the buyer bids
according to then the optimal strategy for the seller is simply
to post the Myerson reserve for every round. - If the buyer bids according
to EXP3 (or any "mean-based" learning algorithm), but the seller is restricted
to "natural" auction formats where overbidding is dominated (e.g. Generalized
First-Price or Generalized Second-Price), then the optimal strategy for the
seller is a pay-your-bid format with decreasing reserves over time. Moreover,
the seller's optimal achievable revenue is characterized by a linear program,
and can be unboundedly better than the best truthful auction yet simultaneously
unboundedly worse than the expected welfare
Learning Prices for Repeated Auctions with Strategic Buyers
Inspired by real-time ad exchanges for online display advertising, we
consider the problem of inferring a buyer's value distribution for a good when
the buyer is repeatedly interacting with a seller through a posted-price
mechanism. We model the buyer as a strategic agent, whose goal is to maximize
her long-term surplus, and we are interested in mechanisms that maximize the
seller's long-term revenue. We define the natural notion of strategic regret
--- the lost revenue as measured against a truthful (non-strategic) buyer. We
present seller algorithms that are no-(strategic)-regret when the buyer
discounts her future surplus --- i.e. the buyer prefers showing advertisements
to users sooner rather than later. We also give a lower bound on strategic
regret that increases as the buyer's discounting weakens and shows, in
particular, that any seller algorithm will suffer linear strategic regret if
there is no discounting.Comment: Neural Information Processing Systems (NIPS 2013
Online learning in repeated auctions
Motivated by online advertising auctions, we consider repeated Vickrey
auctions where goods of unknown value are sold sequentially and bidders only
learn (potentially noisy) information about a good's value once it is
purchased. We adopt an online learning approach with bandit feedback to model
this problem and derive bidding strategies for two models: stochastic and
adversarial. In the stochastic model, the observed values of the goods are
random variables centered around the true value of the good. In this case,
logarithmic regret is achievable when competing against well behaved
adversaries. In the adversarial model, the goods need not be identical and we
simply compare our performance against that of the best fixed bid in hindsight.
We show that sublinear regret is also achievable in this case and prove
matching minimax lower bounds. To our knowledge, this is the first complete set
of strategies for bidders participating in auctions of this type
- …