Search CORE

6 research outputs found

A temporal difference method for multi-objective reinforcement learning

Author: Mandow-Andaluz Lorenzo
Perez-de-la-Cruz-Molina Jose Luis
Ruiz-Montiel Manuela
Publication venue: 'Elsevier BV'
Publication date: 17/10/2019
Field of study

This work describes MPQ-learning, an temporal-difference method that approximates the set of all non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. Unlike other approximations to Multi-objective Reinforcement Learning, MPQ-learning does not require additional parameters or preference information, and can be applied to non-convex Pareto frontiers. We also present the results of the application of MPQ-learning to some benchmark problems and compare it to a linearization procedure.This work is partially funded by grants TIN2009-14179 (Spanish Government, Plan Nacional de I+D+i) and TIN2016-80774-R (AEI/FEDER, UE) (Spanish Government, Agencia Estatal de Investigación; and European Union, Fondo Europeo de Desarrollo Regional). Manuela Ruiz-Montiel is funded by the Spanish Ministry of Education through the National F.P.U. Program

Repositorio Institucional Universidad de Málaga

Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem

Author: Biju Vinai George
Channegowda Ravikumar Hodikehosahally
Jankatti Santosh Kumar
Jinachandra Niranjana Shravanabelagola
Srinivasaiah Raghavendra
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem

Institute of Advanced Engineering and Science

Dynamic multi-objective optimisation using deep reinforcement learning::benchmark, algorithm and an application to identify vulnerable zones based on water quality

Author: Aberdeen
Algoul
Amato
Amitrano
Anschel
Antanasijević
Antesar Shabut
Arulkumaran
Azzouz
Azzouz
Azzouz
Azzouz
Baldi
Barbosa
Barrett
Bellman
Bertsekas
Bora
Castelletti
Chatzilygeroudis
Chen
Cheng
Cheng
Chou
Chunming
Couto
Crites
Cámara
Deb
Deb
Farina
Feinberg
French
Glavic
Gopakumar
Gross
Hasan
He
Helbig
Hoverman
Hutzschenreuter
Hämäläinen
Hämäläinen
Isikdogan
Jaderberg
Ji
Jiang
Khin Lwin
Kintsakis
Kober
Koo
Li
Li
Li
Li
Lima
Lima
Lindberg
Liu
Lizotte
Luiz Fernando Bittencourt
Lwin
M.A. Hossain
Maalawi
Maryam Imani
Md Mahmudul Hasan
Mehnen
Meisel
Mirowski
Mnih
Muruganantham
Narasimhan
Natarajan
Nguyen
Nogueira
Parisi
Pawara
Peek
Perez
Preissner
Raquel
Roijers
Ruiz-Montiel
Sarkar
Schaul
Schmidhuber
Shen
Sierra
Silver
Silver
Slaughter
Su
Sutton
Sutton
Szita
Tajmajer
Tian
Tomas
Tozer
Ursem
Vamplew
Vamplew
Vamplew
Van Hasselt
Van Moffaert
Wang
Wang
Wang
Wang
Watkins
Wilson
Zheng
Zhou
Publication venue: 'Elsevier BV'
Publication date: 06/09/2019
Field of study

Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors’ knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in São Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario

Crossref

Teeside University's Research Repository

Anglia Ruskin Research

Research @Leeds Trinity University

Repositorio da Producao Cientifica e Intelectual da Unicamp

A practical guide to multi-objective reinforcement learning and planning

Author: Bargiacchi Eugenio
Dazeley Richard
Hayes Conor
Heintz Frederick
Howley Enda
Irissappane Athirai
Källström Johan
Macfarlane Matthew
Mannion Patrick
Nowé Ann
Ramos Gabriel
Restelli Marcello
Reymond Mathieu
Roijers Diederik
Rădulescu Roxana
Vamplew Peter
Verstraeten Timothy
Zintgraf Luisa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)

Federation ResearchOnline