Search CORE

148 research outputs found

Adaptation to Easy Data in Prediction with Limited Advice

Author: Seldin Yevgeny
Thune Tobias Sommer
Publication venue
Publication date: 01/01/2018
Field of study

We derive an online learning algorithm with improved regret guarantees for `easy' loss sequences. We consider two types of `easiness': (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. We show that just one additional observation per round is sufficient to circumvent the impossibility result. The proposed Second Order Difference Adjustments (SODA) algorithm requires no prior knowledge of the effective range of the losses,

\varepsilon

, and achieves an

O(\varepsilon \sqrt{KT \ln K}) + \tilde{O}(\varepsilon K \sqrt[4]{T})

expected regret guarantee, where

T

is the time horizon and

K

is the number of actions. The scaling with the effective loss range is achieved under significantly weaker assumptions than those made by Cesa-Bianchi and Shamir [2018] in an earlier attempt to circumvent the impossibility result. We also provide a regret lower bound of

\Omega(\varepsilon\sqrt{T K})

, which almost matches the upper bound. In addition, we show that in the stochastic setting SODA achieves an

O\left(\sum_{a:\Delta_a>0} \frac{K^3 \varepsilon^2}{\Delta_a}\right)

pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. In other words, SODA is safe against an unrestricted oblivious adversary and provides improved regret guarantees for at least two different types of `easiness' simultaneously.Comment: Fixed a mistake in the proof and statement of Theorem

arXiv.org e-Print Archive

Copenhagen University Research Information System

PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

Author: Auer Peter
Cesa-Bianchi Nicolò
Laviolette François
Peters Jan
Seldin Yevgeny
Shawe-Taylor John
Publication venue
Publication date: 01/01/2011
Field of study

We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.Comment: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop

arXiv.org e-Print Archive

MPG.PuRe

Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making

Author: Abels Axel
Lenaerts Tom
Nowé Ann
Trianni Vito
Publication venue
Publication date: 04/05/2023
Field of study

Experts advising decision-makers are likely to display expertise which varies as a function of the problem instance. In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise. We provide here new algorithms that explicitly consider and adapt to the relationship between problem instances and experts' knowledge. We first propose and highlight the drawbacks of a naive approach based on nearest neighbor queries. To address these drawbacks we then introduce a novel algorithm - expertise trees - that constructs decision trees enabling the learner to select appropriate models. We provide theoretical insights and empirically validate the improved performance of our novel approach on a range of problems for which existing methods proved to be inadequate.Comment: Proceedings of the 40th International Conference on Machine Learning (2023

arXiv.org e-Print Archive

Information Directed Sampling for Stochastic Bandits with Graph Feedback

Author: Buccapatnam Swapna
Liu Fang
Shroff Ness
Publication venue
Publication date: 08/11/2017
Field of study

We consider stochastic multi-armed bandit problems with graph feedback, where the decision maker is allowed to observe the neighboring actions of the chosen action. We allow the graph structure to vary with time and consider both deterministic and Erd\H{o}s-R\'enyi random graph models. For such a graph feedback model, we first present a novel analysis of Thompson sampling that leads to tighter performance bound than existing work. Next, we propose new Information Directed Sampling based policies that are graph-aware in their decision making. Under the deterministic graph case, we establish a Bayesian regret bound for the proposed policies that scales with the clique cover number of the graph instead of the number of actions. Under the random graph case, we provide a Bayesian regret bound for the proposed policies that scales with the ratio of the number of actions over the expected number of observations per iteration. To the best of our knowledge, this is the first analytical result for stochastic bandits with random graph feedback. Finally, using numerical evaluations, we demonstrate that our proposed IDS policies outperform existing approaches, including adaptions of upper confidence bound,

\epsilon

-greedy and Exp3 algorithms.Comment: Accepted by AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Online Learning and Bandits with Queried Hints

Author: Bhaskara Aditya
Gollapudi Sreenivas
Im Sungjin
Kollias Kostas
Munagala Kamesh
Publication venue
Publication date: 04/11/2022
Field of study

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number (

k

) of choices has better reward (or loss) before making its choice. In this model, we derive algorithms whose regret bounds have exponentially better dependence on the time horizon compared to the classic regret bounds. In particular, we show that probing with

k=2

suffices to achieve time-independent regret bounds for online linear and convex optimization. The same number of probes improve the regret bound of stochastic MAB with independent arms from

O(\sqrt{nT})

O(n^2 \log T)

, where

n

is the number of arms and

T

is the horizon length. For stochastic MAB, we also consider a stronger model where a probe reveals the reward values of the probed arms, and show that in this case,

k=3

probes suffice to achieve parameter-independent constant regret,

O(n^2)

. Such regret bounds cannot be achieved even with full feedback after the play, showcasing the power of limited ``advice'' via probing before making the play. We also present extensions to the setting where the hints can be imperfect, and to the case of stochastic MAB where the rewards of the arms can be correlated.Comment: To appear in ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server