27 research outputs found

    Online learning in repeated auctions

    Full text link
    Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical and we simply compare our performance against that of the best fixed bid in hindsight. We show that sublinear regret is also achievable in this case and prove matching minimax lower bounds. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type

    Truthful Learning Mechanisms for Multi-Slot Sponsored Search Auctions with Externalities

    Get PDF
    Sponsored search auctions constitute one of the most successful applications of microeconomic mechanisms. In mechanism design, auctions are usually designed to incentivize advertisers to bid their truthful valuations and to assure both the advertisers and the auctioneer a non-negative utility. Nonetheless, in sponsored search auctions, the click-through-rates (CTRs) of the advertisers are often unknown to the auctioneer and thus standard truthful mechanisms cannot be directly applied and must be paired with an effective learning algorithm for the estimation of the CTRs. This introduces the critical problem of designing a learning mechanism able to estimate the CTRs at the same time as implementing a truthful mechanism with a revenue loss as small as possible compared to an optimal mechanism designed with the true CTRs. Previous work showed that, when dominant-strategy truthfulness is adopted, in single-slot auctions the problem can be solved using suitable exploration-exploitation mechanisms able to achieve a per-step regret (over the auctioneer's revenue) of order O(T1/3)O(T^{-1/3}) (where T is the number of times the auction is repeated). It is also known that, when truthfulness in expectation is adopted, a per-step regret (over the social welfare) of order O(T1/2)O(T^{-1/2}) can be obtained. In this paper we extend the results known in the literature to the case of multi-slot auctions. In this case, a model of the user is needed to characterize how the advertisers' valuations change over the slots. We adopt the cascade model that is the most famous model in the literature for sponsored search auctions. We prove a number of novel upper bounds and lower bounds both on the auctioneer's revenue loss and social welfare w.r.t. to the VCG auction and we report numerical simulations investigating the accuracy of the bounds in predicting the dependency of the regret on the auction parameters

    Decisions, Learning and Games: You've Got To Have Freedom.

    Get PDF
    Maintaining a subject's freedom to decide imposes structure and constraints on learning systems that aim to guide those decisions. Two natural sources from which subjects can learn to make good decisions are past experiences and advice from others. Both are affected by the subject's freedom to ultimately act as they wish, giving rise to learning theoretic and game theoretic repercussions respectively. To study the effect of past experiences, we extend the standard bandit setting: after the algorithm chooses an action, the subject may actually carry out a different action. This is then observed along with the reward. Algorithms whose choice of action is mediated by the subject can gain from awareness of the subject's actual actions, which we term compliance awareness. We present algorithms that take advantage of compliance awareness, while maintaining worst case regret bounds up to multiplicative constants. We study their empirical finite sample performance on synthetic data and simulations using real data from clinical trials. To study the effect of advice of others, we consider the literature on incentives for multiple experts by a decision maker that will take an action and receive a reward about which the experts may have information. Existing mechanisms for multiple experts are known not to be truthful, even in the limited sense of myopic incentive compatibility, unless the decision maker renounces their ability to always take on the best ex-post action and commits to a randomized strategy with full support. We present a new class of mechanisms based on second price auctions that maintain the subject's freedom. Experts submit their private information, and the algorithm auctions off the rights to a share of the reward of the subject, who then has freedom to pick the action they desire after observing the submitted information. We show several situations in which existing mechanisms fail and this one succeeds. We also consider strategic limitations of this mechanism beyond the myopic setting that arise due to complementary information between experts, and practical considerations in its implementation in real institutions. We conclude by considering a natural hybrid setting, where a sequence of subjects make decisions and each can receive advice from a fixed set of experts that the mechanism seeks to incentivize. The model for this setting is extremely general, having as special cases standard, compliance aware and contextual bandits, as well as decision markets. We present a novel practical market structure for this setting that incentivizes exploration, information revelation, and aggregation with selfish experts
    corecore