12,234 research outputs found
Online learning in repeated auctions
Motivated by online advertising auctions, we consider repeated Vickrey
auctions where goods of unknown value are sold sequentially and bidders only
learn (potentially noisy) information about a good's value once it is
purchased. We adopt an online learning approach with bandit feedback to model
this problem and derive bidding strategies for two models: stochastic and
adversarial. In the stochastic model, the observed values of the goods are
random variables centered around the true value of the good. In this case,
logarithmic regret is achievable when competing against well behaved
adversaries. In the adversarial model, the goods need not be identical and we
simply compare our performance against that of the best fixed bid in hindsight.
We show that sublinear regret is also achievable in this case and prove
matching minimax lower bounds. To our knowledge, this is the first complete set
of strategies for bidders participating in auctions of this type
Learning to Bid in Repeated First-Price Auctions with Budgets
Budget management strategies in repeated auctions have received growing
attention in online advertising markets. However, previous work on budget
management in online bidding mainly focused on second-price auctions. The rapid
shift from second-price auctions to first-price auctions for online ads in
recent years has motivated the challenging question of how to bid in repeated
first-price auctions while controlling budgets.
In this work, we study the problem of learning in repeated first-price
auctions with budgets. We design a dual-based algorithm that can achieve a
near-optimal regret with full information feedback
where the maximum competing bid is always revealed after each auction. We
further consider the setting with one-sided information feedback where only the
winning bid is revealed after each auction. We show that our modified algorithm
can still achieve an regret with mild assumptions on
the bidder's value distribution. Finally, we complement the theoretical results
with numerical experiments to confirm the effectiveness of our budget
management policy
Applying Opponent Modeling for Automatic bidding in Online Repeated Auctions
Online auction scenarios, such as bidding searches on advertising platforms,
often require bidders to participate repeatedly in auctions for the same or
similar items. We design an algorithm for adaptive automatic bidding in
repeated auctions in which the seller and other bidders also update their
strategies. We apply and improve the opponent modeling algorithm to allow
bidders to learn optimal bidding strategies in this multiagent reinforcement
learning environment. The algorithm uses almost no private information about
the opponent or restrictions on the strategy space, so it can be extended to
multiple scenarios. Our algorithm improves the utility compared to both static
bidding strategies and dynamic learning strategies. We hope the application of
opponent modeling in auctions will promote the research of automatic bidding
strategies in online auctions and the design of non-incentive compatible
auction mechanisms
Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets
In online ad markets, a rising number of advertisers are employing bidding
agencies to participate in ad auctions. These agencies are specialized in
designing online algorithms and bidding on behalf of their clients. Typically,
an agency usually has information on multiple advertisers, so she can
potentially coordinate bids to help her clients achieve higher utilities than
those under independent bidding.
In this paper, we study coordinated online bidding algorithms in repeated
second-price auctions with budgets. We propose algorithms that guarantee every
client a higher utility than the best she can get under independent bidding. We
show that these algorithms achieve maximal coalition welfare and discuss
bidders' incentives to misreport their budgets, in symmetric cases. Our proofs
combine the techniques of online learning and equilibrium analysis, overcoming
the difficulty of competing with a multi-dimensional benchmark. The performance
of our algorithms is further evaluated by experiments on both synthetic and
real data. To the best of our knowledge, we are the first to consider bidder
coordination in online repeated auctions with constraints.Comment: 43 pages, 12 figure
Multi-Platform Budget Management in Ad Markets with Non-IC Auctions
In online advertising markets, budget-constrained advertisers acquire ad
placements through repeated bidding in auctions on various platforms. We
present a strategy for bidding optimally in a set of auctions that may or may
not be incentive-compatible under the presence of budget constraints. Our
strategy maximizes the expected total utility across auctions while satisfying
the advertiser's budget constraints in expectation. Additionally, we
investigate the online setting where the advertiser must submit bids across
platforms while learning about other bidders' bids over time. Our algorithm has
regret under the full-information setting. Finally, we demonstrate
that our algorithms have superior cumulative regret on both synthetic and
real-world datasets of ad placement auctions, compared to existing adaptive
pacing algorithms.Comment: 34 pages, 5 figure
Optimal No-regret Learning in Repeated First-price Auctions
We study online learning in repeated first-price auctions with censored
feedback, where a bidder, only observing the winning bid at the end of each
auction, learns to adaptively bid in order to maximize her cumulative payoff.
To achieve this goal, the bidder faces a challenging dilemma: if she wins the
bid--the only way to achieve positive payoffs--then she is not able to observe
the highest bid of the other bidders, which we assume is iid drawn from an
unknown distribution. This dilemma, despite being reminiscent of the
exploration-exploitation trade-off in contextual bandits, cannot directly be
addressed by the existing UCB or Thompson sampling algorithms in that
literature, mainly because contrary to the standard bandits setting, when a
positive reward is obtained here, nothing about the environment can be learned.
In this paper, by exploiting the structural properties of first-price
auctions, we develop the first learning algorithm that achieves
regret bound when the bidder's private values are
stochastically generated. We do so by providing an algorithm on a general class
of problems, which we call monotone group contextual bandits, where the same
regret bound is established under stochastically generated contexts. Further,
by a novel lower bound argument, we characterize an lower
bound for the case where the contexts are adversarially generated, thus
highlighting the impact of the contexts generation mechanism on the fundamental
learning limit. Despite this, we further exploit the structure of first-price
auctions and develop a learning algorithm that operates sample-efficiently (and
computationally efficiently) in the presence of adversarially generated private
values. We establish an regret bound for this algorithm,
hence providing a complete characterization of optimal learning guarantees for
this problem
Learning in Repeated Multi-Unit Pay-As-Bid Auctions
Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and
Procurement Auctions, which all involve the auctioning of homogeneous multiple
units, we consider the problem of learning how to bid in repeated multi-unit
pay-as-bid auctions. In each of these auctions, a large number of (identical)
items are to be allocated to the largest submitted bids, where the price of
each of the winning bids is equal to the bid itself. The problem of learning
how to bid in pay-as-bid auctions is challenging due to the combinatorial
nature of the action space. We overcome this challenge by focusing on the
offline setting, where the bidder optimizes their vector of bids while only
having access to the past submitted bids by other bidders. We show that the
optimal solution to the offline problem can be obtained using a polynomial time
dynamic programming (DP) scheme. We leverage the structure of the DP scheme to
design online learning algorithms with polynomial time and space complexity
under full information and bandit feedback settings. We achieve an upper bound
on regret of and respectively, where is the number of units demanded by the
bidder, is the total number of auctions, and is the size of
the discretized bid space. We accompany these results with a regret lower
bound, which match the linear dependency in . Our numerical results suggest
that when all agents behave according to our proposed no regret learning
algorithms, the resulting market dynamics mainly converge to a welfare
maximizing equilibrium where bidders submit uniform bids. Lastly, our
experiments demonstrate that the pay-as-bid auction consistently generates
significantly higher revenue compared to its popular alternative, the uniform
price auction.Comment: 51 pages, 12 Figure
- …