Search CORE

845 research outputs found

Variational Bayesian Reinforcement Learning with Regret Bounds

Author: O'Donoghue Brendan
Publication venue
Publication date: 01/07/2019
Field of study

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with an epistemic-risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized to minimize regret, or annealed according to a schedule. We call the resulting algorithm K-learning and we show that the K-values that the agent maintains are optimistic for the expected optimal Q-values at each state-action pair. The utility function approach induces a natural Boltzmann exploration policy for which the 'temperature' parameter is equal to the risk-seeking parameter. This policy achieves a Bayesian regret bound of

\tilde O(L^{3/2} \sqrt{SAT})

, where L is the time horizon, S is the number of states, A is the number of actions, and T is the total number of elapsed time-steps. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice

arXiv.org e-Print Archive

Reinforcement Learning, Bit by Bit

Author: Dwaracherla Vikranth
Ibrahimi Morteza
Lu Xiuyuan
Osband Ian
Van Roy Benjamin
Wen Zheng
Publication venue
Publication date: 12/04/2021
Field of study

Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We develop concepts and establish a regret bound that together offer principled guidance. The bound sheds light on questions of what information to seek, how to seek that information, and it what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that demonstrate improvements in data efficiency

arXiv.org e-Print Archive

Decision making for future energy systems

Author: Dent Chris J.
French Simon
Zachary Stan
Publication venue
Publication date: 11/12/2020
Field of study

Edinburgh Research Explorer

Play without regret

Author: Galeazzi P.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications