Search CORE

6,178 research outputs found

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Author: Ortner Ronald
Ryabko Daniil
Publication venue
Publication date: 01/01/2012
Field of study

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

The socioeconomic dynamics of the shifta conflict in Kenya, c. 1963-8

Author: Adar
Anderson
Branch
Branch
Elkins
Farah
Feyissa
HANNAH ALICE WHITTAKER
Iyob
Lewis
M'Inoti
Markakis
Mburu
Ogot
Schlee
Whittaker
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/11/2012
Field of study

Using a set of oral testimonies, together with military, intelligence, and administrative reports from the 1960s, this article re-examines the shifta conflict in Kenya. The article moves away from mono-causal, nationalistic interpretations of the event, to focus instead on the underlying socioeconomic dynamics and domestic implications of the conflict. It argues that the nationalist interpretation fails to capture the diversity of participation in shifta, which was not simply made up of militant Somali nationalists, and that it fails to acknowledge the significance of an internal Kenyan conflict between a newly independent state in the process of nation building, and a group of ‘dissident’ frontier communities that were seen to defy the new order. Examination of this conflict provides insights into the operation of the early postcolonial Kenyan stateThe Arts and Humanities Research Council,The Royal Historical Society, Martin Lynn Scholarshi

Crossref

Brunel University Research Archive

Rotting bandits are not harder than stochastic ones

Author: Carpentier Alexandra
Lazaric Alessandro
Locatelli Andrea
Seznec Julien
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon

T

, and without any knowledge on the decreasing behavior of the

K

arms, FEWA achieves problem-dependent regret bound of

\widetilde{\mathcal{O}}(\log{(KT)}),

and a problem-independent one of

\widetilde{\mathcal{O}}(\sqrt{KT})

. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret

\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})

. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions

Author: Besson Lilian
Bonnefoi Remi
Manco-Vasquez Julio
Moy Christophe
Publication venue
Publication date: 27/02/2019
Field of study

In this paper, we propose and evaluate different learning strategies based on Multi-Arm Bandit (MAB) algorithms. They allow Internet of Things (IoT) devices to improve their access to the network and their autonomy, while taking into account the impact of encountered radio collisions. For that end, several heuristics employing Upper-Confident Bound (UCB) algorithms are examined, to explore the contextual information provided by the number of retransmissions. Our results show that approaches based on UCB obtain a significant improvement in terms of successful transmission probabilities. Furthermore, it also reveals that a pure UCB channel access is as efficient as more sophisticated learning strategies.Comment: The source code (MATLAB or Octave) used for the simula-tions and the figures is open-sourced under the MIT License, atBitbucket.org/scee\_ietr/ucb\_smart\_retran

arXiv.org e-Print Archive

Crossref

Best-Arm Identification in Linear Bandits

Author: Lazaric Alessandro
Munos Rémi
Soare Marta
Publication venue
Publication date: 04/11/2014
Field of study

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter

\theta^*

and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the

G

-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server