Search CORE

6,464 research outputs found

Selling to a No-Regret Buyer

Author: Braverman Mark
Mao Jieming
Schneider Jon
Weinberg S. Matthew
Publication venue
Publication date: 24/11/2017
Field of study

We consider the problem of a single seller repeatedly selling a single item to a single buyer (specifically, the buyer has a value drawn fresh from known distribution

D

in every round). Prior work assumes that the buyer is fully rational and will perfectly reason about how their bids today affect the seller's decisions tomorrow. In this work we initiate a different direction: the buyer simply runs a no-regret learning algorithm over possible bids. We provide a fairly complete characterization of optimal auctions for the seller in this domain. Specifically: - If the buyer bids according to EXP3 (or any "mean-based" learning algorithm), then the seller can extract expected revenue arbitrarily close to the expected welfare. This auction is independent of the buyer's valuation

D

, but somewhat unnatural as it is sometimes in the buyer's interest to overbid. - There exists a learning algorithm

\mathcal{A}

such that if the buyer bids according to

\mathcal{A}

then the optimal strategy for the seller is simply to post the Myerson reserve for

D

every round. - If the buyer bids according to EXP3 (or any "mean-based" learning algorithm), but the seller is restricted to "natural" auction formats where overbidding is dominated (e.g. Generalized First-Price or Generalized Second-Price), then the optimal strategy for the seller is a pay-your-bid format with decreasing reserves over time. Moreover, the seller's optimal achievable revenue is characterized by a linear program, and can be unboundedly better than the best truthful auction yet simultaneously unboundedly worse than the expected welfare

arXiv.org e-Print Archive

Rotting bandits are not harder than stochastic ones

Author: Carpentier Alexandra
Lazaric Alessandro
Locatelli Andrea
Seznec Julien
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon

T

, and without any knowledge on the decreasing behavior of the

K

arms, FEWA achieves problem-dependent regret bound of

\widetilde{\mathcal{O}}(\log{(KT)}),

and a problem-independent one of

\widetilde{\mathcal{O}}(\sqrt{KT})

. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret

\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})

. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Come drink with me : If you dare : Golden Swallow, King Hu, and the Cold War = 《大醉俠》 : 金燕子、胡金銓與冷戰時代

Author: MARCHETTI Gina
Publication venue: Digital Commons @ Lingnan University
Publication date: 01/01/2007
Field of study