Search CORE

5,396 research outputs found

Accessiblity and validity of smart meter data

Author: Bilton M
Carmichael R
Dragovic J
Schofield J
Strbac
Whitney A
Woolf M
Publication venue
Publication date: 30/09/2014
Field of study

Spiral - Imperial College Digital Repository

Making Linear MDPs Practical via Contrastive Representation Learning

Author: Dai Bo
Gonzalez Joseph E.
Ren Tongzheng
Schuurmans Dale
Yang Mengjiao
Zhang Tianjun
Publication venue
Publication date: 14/07/2022
Field of study

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.Comment: ICML 2022. The first two authors contribute equall

arXiv.org e-Print Archive

A survey of preference-based reinforcement learning methods

Author: Akrour Riad
Fürnkranz Johannes
Neumann Gerhard
Wirth Christian
Publication venue: Journal of Machine Learning Research / Massachusetts Institute of Technology Press (MIT Press) / Microtome
Publication date: 01/12/2017
Field of study

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function often requires a lot of task- specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert's preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. Furthermore, we point out shortcomings of current algorithms, propose open research questions and briefly survey practical tasks that have been solved using PbRL

University of Lincoln Institutional Repository

Advancements in Safe Deep Reinforcement Learning for Real-Time Strategy Games and Industry Applications

Author: Andersen Per-Arne
Publication venue: 'University of Agder'
Publication date: 01/01/2022
Field of study

publishedVersio

Agder University Research Archive

Robotics deep reinforcement learning with loose prior knowledge

Author: Botteghi Nicolò
Publication venue: University of Twente
Publication date: 06/10/2021
Field of study

University of Twente Research Information