Search CORE

8 research outputs found

Optimal No-regret Learning in Repeated First-price Auctions

Author: Han Yanjun
Weissman Tsachy
Zhou Zhengyuan
Publication venue
Publication date: 08/05/2020
Field of study

We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms in that literature, mainly because contrary to the standard bandits setting, when a positive reward is obtained here, nothing about the environment can be learned. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves

O(\sqrt{T}\log^2 T)

regret bound when the bidder's private values are stochastically generated. We do so by providing an algorithm on a general class of problems, which we call monotone group contextual bandits, where the same regret bound is established under stochastically generated contexts. Further, by a novel lower bound argument, we characterize an

\Omega(T^{2/3})

lower bound for the case where the contexts are adversarially generated, thus highlighting the impact of the contexts generation mechanism on the fundamental learning limit. Despite this, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an

O(\sqrt{T}\log^3 T)

regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for this problem

arXiv.org e-Print Archive

Advancing Ad Auction Realism: Practical Insights & Modeling Implications

Author: Chen Ming
Nabi Sareh
Siniscalchi Marciano
Publication venue
Publication date: 21/07/2023
Field of study

This paper proposes a learning model of online ad auctions that allows for the following four key realistic characteristics of contemporary online auctions: (1) ad slots can have different values and click-through rates depending on users' search queries, (2) the number and identity of competing advertisers are unobserved and change with each auction, (3) advertisers only receive partial, aggregated feedback, and (4) payment rules are only partially specified. We model advertisers as agents governed by an adversarial bandit algorithm, independent of auction mechanism intricacies. Our objective is to simulate the behavior of advertisers for counterfactual analysis, prediction, and inference purposes. Our findings reveal that, in such richer environments, "soft floors" can enhance key performance metrics even when bidders are drawn from the same population. We further demonstrate how to infer advertiser value distributions from observed bids, thereby affirming the practical efficacy of our approach even in a more realistic auction setting

arXiv.org e-Print Archive

Multi-Platform Budget Management in Ad Markets with Non-IC Auctions

Author: Golrezaei Negin
Schrijvers Okke
Susan Fransisca
Publication venue
Publication date: 12/06/2023
Field of study

In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has

O(T^{3/4})

regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.Comment: 34 pages, 5 figure

arXiv.org e-Print Archive

Online Learning under Budget and ROI Constraints via Weak Adaptivity

Author: Castiglioni Matteo
Celli Andrea
Kroer Christian
Publication venue
Publication date: 02/03/2024
Field of study

We study online learning problems in which a decision maker has to make a sequence of costly decisions, with the goal of maximizing their expected reward while adhering to budget and return-on-investment (ROI) constraints. Existing primal-dual algorithms designed for constrained online learning problems under adversarial inputs rely on two fundamental assumptions. First, the decision maker must know beforehand the value of parameters related to the degree of strict feasibility of the problem (i.e. Slater parameters). Second, a strictly feasible solution to the offline optimization problem must exist at each round. Both requirements are unrealistic for practical applications such as bidding in online ad auctions. In this paper, we show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers. This results in a ``dual-balancing'' framework which ensures that dual variables stay sufficiently small, even in the absence of knowledge about Slater's parameter. We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions, under stochastic and adversarial inputs. Finally, we show how to instantiate the framework to optimally bid in various mechanisms of practical relevance, such as first- and second-price auctions

arXiv.org e-Print Archive

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Author: Cheung Wang Chi
Simchi-Levi David
Zhu Ruihao
Publication venue
Publication date: 24/06/2020
Field of study

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm, and establish its dynamic regret bound when the variation budgets are known. In addition, we propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound, but in a parameter-free manner, i.e., without knowing the variation budgets. Notably, learning non-stationary MDPs via the conventional optimistic exploration technique presents a unique challenge absent in existing (non-stationary) bandit learning settings. We overcome the challenge by a novel confidence widening technique that incorporates additional optimism.Comment: To appear in proceedings of the 37th International Conference on Machine Learning. Shortened conference version of its journal version (available at: arXiv:1906.02922

arXiv.org e-Print Archive

DSpace@MIT

Online Learning in Multi-unit Auctions

Author: Brânzei Simina
Derakhshan Mahsa
Golrezaei Negin
Han Yanjun
Publication venue
Publication date: 27/05/2023
Field of study

We consider repeated multi-unit auctions with uniform pricing, which are widely used in practice for allocating goods such as carbon licenses. In each round,

K

identical units of a good are sold to a group of buyers that have valuations with diminishing marginal returns. The buyers submit bids for the units, and then a price

p

is set per unit so that all the units are sold. We consider two variants of the auction, where the price is set to the

K

-th highest bid and

(K+1)

-st highest bid, respectively. We analyze the properties of this auction in both the offline and online settings. In the offline setting, we consider the problem that one player

i

is facing: given access to a data set that contains the bids submitted by competitors in past auctions, find a bid vector that maximizes player

i

's cumulative utility on the data set. We design a polynomial time algorithm for this problem, by showing it is equivalent to finding a maximum-weight path on a carefully constructed directed acyclic graph. In the online setting, the players run learning algorithms to update their bids as they participate in the auction over time. Based on our offline algorithm, we design efficient online learning algorithms for bidding. The algorithms have sublinear regret, under both full information and bandit feedback structures. We complement our online learning algorithms with regret lower bounds. Finally, we analyze the quality of the equilibria in the worst case through the lens of the core solution concept in the game among the bidders. We show that the

(K+1)

-st price format is susceptible to collusion among the bidders; meanwhile, the

K

-th price format does not have this issue

arXiv.org e-Print Archive

Efficient Algorithms for Minimizing Compositions of Convex Functions and Random Functions and Its Applications in Network Revenue Management

Author: Chen Xin
He Niao
Hu Yifan
Ye Zikun
Publication venue
Publication date: 16/08/2023
Field of study

In this paper, we study a class of nonconvex stochastic optimization in the form of

\min_{x\in\mathcal{X}} F(x):=\mathbb{E}_\xi [f(\phi(x,\xi))]

, where the objective function

F

is a composition of a convex function

f

and a random function

\phi

. Leveraging an (implicit) convex reformulation via a variable transformation

u=\mathbb{E}[\phi(x,\xi)]

, we develop stochastic gradient-based algorithms and establish their sample and gradient complexities for achieving an

\epsilon

-global optimal solution. Interestingly, our proposed Mirror Stochastic Gradient (MSG) method operates only in the original

x

-space using gradient estimators of the original nonconvex objective

F

and achieves

\tilde{\mathcal{O}}(\epsilon^{-2})

sample and gradient complexities, which matches the lower bounds for solving stochastic convex optimization problems. Under booking limits control, we formulate the air-cargo network revenue management (NRM) problem with random two-dimensional capacity, random consumption, and routing flexibility as a special case of the stochastic nonconvex optimization, where the random function

\phi(x,\xi)=x\wedge\xi

, i.e., the random demand

\xi

truncates the booking limit decision

x

. Extensive numerical experiments demonstrate the superior performance of our proposed MSG algorithm for booking limit control with higher revenue and lower computation cost than state-of-the-art bid-price-based control policies, especially when the variance of random capacity is large. KEYWORDS: stochastic nonconvex optimization, hidden convexity, air-cargo network revenue management, gradient-based algorithm

arXiv.org e-Print Archive