Search CORE

11 research outputs found

Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States

Author: Chen Yifei
Kasaei Hamidreza
Schomaker Lambert
Wiering Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/07/2021
Field of study

Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nev-ertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks

Author: Bharadwaj Sudarshanan
Jiang Yuqian
Shah Rishi
Stone Peter
Topcu Ufuk
Wu Bo
Publication venue
Publication date: 03/07/2020
Field of study

In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. However, to the best of our knowledge, the theoretical properties of reward shaping have thus far only been established in the discounted setting. This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered. In order to avoid the need for manual construction of the shaping function, we introduce a method for utilizing domain knowledge expressed as a temporal logic formula. The formula is automatically translated to a shaping function that provides additional reward throughout the learning process. We evaluate the proposed method on three continuing tasks. In all cases, shaping speeds up the average-reward learning rate without any reduction in the performance of the learned policy compared to relevant baselines

arXiv.org e-Print Archive

COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making

Author: Bashir Musa
Gao Hongbo
Li Huanhuan
Wang Chengbo
Yang Zaili
Zhang Xinyu
Publication venue: Elsevier BV
Publication date: 16/07/2024
Field of study

Maritime autonomous surface ship (MASS) represents a significant advancement in maritime technology, offering the potential for increased efficiency, reduced operational costs, and enhanced maritime traffic safety. However, MASS navigation in complex maritime traffic and congested water areas presents challenges, especially in Collision Avoidance Decision Making (CADM) during multi-ship encounter scenarios. Through a robust risk assessment design for time-sequential and joint-target ships (TSs) encounter scenarios, a novel risk and reliability critic-enhanced safe hierarchical reinforcement learning (RA-SHRL), constrained by the International Regulations for Preventing Collisions at Sea (COLREGs), is proposed to realize the autonomous navigation and CADM of MASS. Finally, experimental simulations are conducted against a time-sequenced obstacle avoidance scenario and a swarm obstacle avoidance scenario. The experimental results demonstrate that RA-SHRL generates safe, efficient, and reliable collision avoidance strategies in both time-sequential dynamic obstacles and mixed joint-TSs environments. Additionally, the RA-SHRL is capable of assessing risk and avoiding multiple joint-TSs. Compared with Deep Q-network (DQN) and Constrained Policy Optimization (CPO), the search efficiency of the algorithm proposed in this paper is improved by 40% and 12%, respectively. Moreover, it achieved a 91.3% success rate of collision avoidance during training. The methodology could also benefit other autonomous systems in dynamic environments

University of Liverpool Repository

Generalization strategies in reinforcement learning

Author: Snel M.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications