Search CORE

230 research outputs found

Steering approaches to Pareto-optimal multiobjective reinforcement learning

Author: Berry Adam
Creighton Douglas
Dazeley Richard
Foale Cameron
Issabekov Rustam
Moore Tim
Vamplew Peter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system

Deakin Research Online

Federation ResearchOnline

Softmax exploration strategies for multiobjective reinforcement learning

Author: Dazeley Richard
Foale Cameron
Vamplew Peter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation

Deakin Research Online

Federation ResearchOnline

The impact of environmental stochasticity on value-based multiobjective reinforcement learning

Author: Dazeley Richard
Foale Cameron
Vamplew Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature

Deakin Research Online

Federation ResearchOnline

A practical guide to multi-objective reinforcement learning and planning

Author: Bargiacchi Eugenio
Dazeley Richard
Hayes Conor
Heintz Frederick
Howley Enda
Irissappane Athirai
Källström Johan
Macfarlane Matthew
Mannion Patrick
Nowé Ann
Ramos Gabriel
Restelli Marcello
Reymond Mathieu
Roijers Diederik
Rădulescu Roxana
Vamplew Peter
Verstraeten Timothy
Zintgraf Luisa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)

Federation ResearchOnline

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Author: Bargiacchi Eugenio
Dazeley Richard
Hayes Conor F.
Heintz Fredrik
Howley Enda
Irissappane Athirai A.
Källström Johan
Macfarlane Matthew
Mannion Patrick
Nowé Ann
Ramos Gabriel
Restelli Marcello
Reymond Mathieu
Roijers Diederik M.
Rădulescu Roxana
Vamplew Peter
Verstraeten Timothy
Zintgraf Luisa M.
Publication venue
Publication date: 17/03/2021
Field of study

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Deakin Research Online

Federation ResearchOnline

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Dynamic multi-objective optimisation using deep reinforcement learning::benchmark, algorithm and an application to identify vulnerable zones based on water quality

Author: Aberdeen
Algoul
Amato
Amitrano
Anschel
Antanasijević
Antesar Shabut
Arulkumaran
Azzouz
Azzouz
Azzouz
Azzouz
Baldi
Barbosa
Barrett
Bellman
Bertsekas
Bora
Castelletti
Chatzilygeroudis
Chen
Cheng
Cheng
Chou
Chunming
Couto
Crites
Cámara
Deb
Deb
Farina
Feinberg
French
Glavic
Gopakumar
Gross
Hasan
He
Helbig
Hoverman
Hutzschenreuter
Hämäläinen
Hämäläinen
Isikdogan
Jaderberg
Ji
Jiang
Khin Lwin
Kintsakis
Kober
Koo
Li
Li
Li
Li
Lima
Lima
Lindberg
Liu
Lizotte
Luiz Fernando Bittencourt
Lwin
M.A. Hossain
Maalawi
Maryam Imani
Md Mahmudul Hasan
Mehnen
Meisel
Mirowski
Mnih
Muruganantham
Narasimhan
Natarajan
Nguyen
Nogueira
Parisi
Pawara
Peek
Perez
Preissner
Raquel
Roijers
Ruiz-Montiel
Sarkar
Schaul
Schmidhuber
Shen
Sierra
Silver
Silver
Slaughter
Su
Sutton
Sutton
Szita
Tajmajer
Tian
Tomas
Tozer
Ursem
Vamplew
Vamplew
Vamplew
Van Hasselt
Van Moffaert
Wang
Wang
Wang
Wang
Watkins
Wilson
Zheng
Zhou
Publication venue: 'Elsevier BV'
Publication date: 06/09/2019
Field of study

Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors’ knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in São Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario

Crossref

Teeside University's Research Repository

Anglia Ruskin Research

Research @Leeds Trinity University

Repositorio da Producao Cientifica e Intelectual da Unicamp

Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation

Author: Parisi Simone
Pirotta Matteo
Restelli Marcello
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Automating Staged Rollout with Reinforcement Learning

Author: Fiondella Lance
Nagaraju Vidhyashree
Pritchard Shadow
Publication venue
Publication date: 01/04/2022
Field of study

Staged rollout is a strategy of incrementally releasing software updates to portions of the user population in order to accelerate defect discovery without incurring catastrophic outcomes such as system wide outages. Some past studies have examined how to quantify and automate staged rollout, but stop short of simultaneously considering multiple product or process metrics explicitly. This paper demonstrates the potential to automate staged rollout with multi-objective reinforcement learning in order to dynamically balance stakeholder needs such as time to deliver new features and downtime incurred by failures due to latent defects

arXiv.org e-Print Archive

A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning

Author: Brys Tim
Chandra Arjun
Esterle Lukas
Lewis Peter R.
Nowé Ann
van Moffaert Kristof
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

To solve multi-objective problems, multiple reward signals are often scalarized into a single value and further processed using established single-objective problem solving techniques. While the field of multi-objective optimization has made many advances in applying scalarization techniques to obtain good solution trade-offs, the utility of applying these techniques in the multi-objective multi-agent learning domain has not yet been thoroughly investigated. Agents learn the value of their decisions by linearly scalarizing their reward signals at the local level, while acceptable system wide behaviour results. However, the non-linear relationship between weighting parameters of the scalarization function and the learned policy makes the discovery of system wide trade-offs time consuming. Our first contribution is a thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup. The analysed approaches intelligently explore the weight-space in order to find a wider range of system trade-offs. In our second contribution, we propose a novel adaptive weight algorithm which interacts with the underlying local multi-objective solvers and allows for a better coverage of the Pareto front. Our third contribution is the experimental validation of our approach by learning bi-objective policies in self-organising smart camera networks. We note that our algorithm (i) explores the objective space faster on many problem instances, (ii) obtained solutions that exhibit a larger hypervolume, while (iii) acquiring a greater spread in the objective space

Aston Publications Explorer

Multiobjective Monte Carlo Tree Search for Real-Time Games

Author: Lucas Simon M
Mostaghim Sanaz
Perez Diego
Samothrakis Spyridon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/08/2014
Field of study

Multiobjective optimization has been traditionally a matter of study in domains like engineering or finance, with little impact on games research. However, action-decision based on multiobjective evaluation may be beneficial in order to obtain a high quality level of play. This paper presents a multiobjective Monte Carlo tree search algorithm for planning and control in real-time game domains, those where the time budget to decide the next move to make is close to 40 ms. A comparison is made between the proposed algorithm, a single-objective version of Monte Carlo tree search and a rolling horizon implementation of nondominated sorting evolutionary algorithm II (NSGA-II). Two different benchmarks are employed, deep sea treasure (DST) and the multiobjective physical traveling salesman problem (MO-PTSP). Using the same heuristics on each game, the analysis is focused on how well the algorithms explore the search space. Results show that the algorithm proposed outperforms NSGA-II. Additionally, it is also shown that the algorithm is able to converge to different optimal solutions or the optimal Pareto front (if achieved during search)

University of Essex Research Repository

Crossref