Search CORE

9,155 research outputs found

Learning on a Budget Using Distributional RL

Author: Bloembergen D. (Daniel)
Hernandez-Leal P. (Pablo)
Kaisers M. (Michael)
Morales E.F. (Eduardo)
Serrano J. (Jonathan)
Publication venue
Publication date: 01/01/2018
Field of study

Agents acting in real-world scenarios often have constraints such as finite budgets or daily job performance targets. While repeated (episodic) tasks can be solved with existing RL algorithms, methods need to be extended if the repetition depends on performance. Recent work has introduced a distributional perspective on reinforcement learning, providing a model of episodic returns. Inspired by these results we contribute the new budget- and risk-aware distributional reinforcement learning (BRAD-RL) algorithm that bootstraps from the C51 distributional output and then uses value iteration to estimate the value of starting an episode with a certain amount of budget. With this strategy we can make budget-wise action selection within each episode and maximize the return across episodes. Experiments in a grid-world domain highlight the benefits of our algorithm, maximizing discounted future returns when low cumulative performance may terminate repetition

CWI's Institutional Repository

Log-Distributional Approach for Learning Covariate Shift Ratios

Author: Bernárdez Gil Guillermo
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Distributional Reinforcement Learning theory suggests that distributional fixed points could play a fundamental role to learning non additive value functions. In particular, we propose a distributional approach for learning Covariate Shift Ratios, whose update rule is originally multiplicative

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

GAN-powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing

Author: Chen Xianfu
Hua Yuxiu
Li Rongpeng
Zhang Honggang
Zhao Zhifeng
Publication venue
Publication date: 20/11/2019
Field of study

Network slicing is a key technology in 5G communications system. Its purpose is to dynamically and efficiently allocate resources for diversified services with distinct requirements over a common underlying physical infrastructure. Therein, demand-aware resource allocation is of significant importance to network slicing. In this paper, we consider a scenario that contains several slices in a radio access network with base stations that share the same physical resources (e.g., bandwidth or slots). We leverage deep reinforcement learning (DRL) to solve this problem by considering the varying service demands as the environment state and the allocated resources as the environment action. In order to reduce the effects of the annoying randomness and noise embedded in the received service level agreement (SLA) satisfaction ratio (SSR) and spectrum efficiency (SE), we primarily propose generative adversarial network-powered deep distributional Q network (GAN-DDQN) to learn the action-value distribution driven by minimizing the discrepancy between the estimated action-value distribution and the target action-value distribution. We put forward a reward-clipping mechanism to stabilize GAN-DDQN training against the effects of widely-spanning utility values. Moreover, we further develop Dueling GAN-DDQN, which uses a specially designed dueling generator, to learn the action-value distribution by estimating the state-value distribution and the action advantage function. Finally, we verify the performance of the proposed GAN-DDQN and Dueling GAN-DDQN algorithms through extensive simulations

arXiv.org e-Print Archive

VTT Research System