12,234 research outputs found

    Online learning in repeated auctions

    Full text link
    Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical and we simply compare our performance against that of the best fixed bid in hindsight. We show that sublinear regret is also achievable in this case and prove matching minimax lower bounds. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type

    Learning to Bid in Repeated First-Price Auctions with Budgets

    Full text link
    Budget management strategies in repeated auctions have received growing attention in online advertising markets. However, previous work on budget management in online bidding mainly focused on second-price auctions. The rapid shift from second-price auctions to first-price auctions for online ads in recent years has motivated the challenging question of how to bid in repeated first-price auctions while controlling budgets. In this work, we study the problem of learning in repeated first-price auctions with budgets. We design a dual-based algorithm that can achieve a near-optimal O~(T)\widetilde{O}(\sqrt{T}) regret with full information feedback where the maximum competing bid is always revealed after each auction. We further consider the setting with one-sided information feedback where only the winning bid is revealed after each auction. We show that our modified algorithm can still achieve an O~(T)\widetilde{O}(\sqrt{T}) regret with mild assumptions on the bidder's value distribution. Finally, we complement the theoretical results with numerical experiments to confirm the effectiveness of our budget management policy

    Applying Opponent Modeling for Automatic bidding in Online Repeated Auctions

    Full text link
    Online auction scenarios, such as bidding searches on advertising platforms, often require bidders to participate repeatedly in auctions for the same or similar items. We design an algorithm for adaptive automatic bidding in repeated auctions in which the seller and other bidders also update their strategies. We apply and improve the opponent modeling algorithm to allow bidders to learn optimal bidding strategies in this multiagent reinforcement learning environment. The algorithm uses almost no private information about the opponent or restrictions on the strategy space, so it can be extended to multiple scenarios. Our algorithm improves the utility compared to both static bidding strategies and dynamic learning strategies. We hope the application of opponent modeling in auctions will promote the research of automatic bidding strategies in online auctions and the design of non-incentive compatible auction mechanisms

    Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets

    Full text link
    In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal coalition welfare and discuss bidders' incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.Comment: 43 pages, 12 figure

    Multi-Platform Budget Management in Ad Markets with Non-IC Auctions

    Full text link
    In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has O(T3/4)O(T^{3/4}) regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.Comment: 34 pages, 5 figure

    Optimal No-regret Learning in Repeated First-price Auctions

    Full text link
    We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms in that literature, mainly because contrary to the standard bandits setting, when a positive reward is obtained here, nothing about the environment can be learned. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves O(Tlog2T)O(\sqrt{T}\log^2 T) regret bound when the bidder's private values are stochastically generated. We do so by providing an algorithm on a general class of problems, which we call monotone group contextual bandits, where the same regret bound is established under stochastically generated contexts. Further, by a novel lower bound argument, we characterize an Ω(T2/3)\Omega(T^{2/3}) lower bound for the case where the contexts are adversarially generated, thus highlighting the impact of the contexts generation mechanism on the fundamental learning limit. Despite this, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an O(Tlog3T)O(\sqrt{T}\log^3 T) regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for this problem

    Learning in Repeated Multi-Unit Pay-As-Bid Auctions

    Full text link
    Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of O(MTlogB)O(M\sqrt{T\log |\mathcal{B}|}) and O(MBTlogB)O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|}) respectively, where MM is the number of units demanded by the bidder, TT is the total number of auctions, and B|\mathcal{B}| is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in MM. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.Comment: 51 pages, 12 Figure
    corecore