Search CORE

11 research outputs found

Fuzzy Ensembles of Reinforcement Learning Policies for Robotic Systems with Varied Parameters

Author: Boiko Igor
Haddad Abdel Gafoor
Mohiuddin Mohammed B.
Zweiri Yahya
Publication venue
Publication date: 08/11/2023
Field of study

Reinforcement Learning (RL) is an emerging approach to control many dynamical systems for which classical control approaches are not applicable or insufficient. However, the resultant policies may not generalize to variations in the parameters that the system may exhibit. This paper presents a powerful yet simple algorithm in which collaboration is facilitated between RL agents that are trained independently to perform the same task but with different system parameters. The independency among agents allows the exploitation of multi-core processing to perform parallel training. Two examples are provided to demonstrate the effectiveness of the proposed technique. The main demonstration is performed on a quadrotor with slung load tracking problem in a real-time experimental setup. It is shown that integrating the developed algorithm outperforms individual policies by reducing the RMSE tracking error. The robustness of the ensemble is also verified against wind disturbance.Comment: arXiv admin note: text overlap with arXiv:2311.0501

arXiv.org e-Print Archive

Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations

Author: Friedhelm Schwenker
Günther Palm
Publication venue: 'Frontiers Media SA'
Publication date: 01/02/2019
Field of study

Research on artificial development, reinforcement learning, and intrinsic motivations like curiosity could profit from the recently developed framework of multi-objective reinforcement learning. The combination of these ideas may lead to more realistic artificial models for life-long learning and goal directed behavior in animals and humans

Directory of Open Access Journals

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Author: Bukharin Alexander
Chen Weizhu
He Pengcheng
Li Yixiao
Zhao Tuo
Publication venue
Publication date: 05/09/2023
Field of study

Reward design is a fundamental, yet challenging aspect of practical reinforcement learning (RL). For simple tasks, researchers typically handcraft the reward function, e.g., using a linear combination of several reward factors. However, such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data. In light of this cost, we investigate learning reward functions for complex tasks with less human effort; simply by ranking the importance of the reward factors. More specifically, we propose a new RL framework -- HERON, which compares trajectories using a hierarchical decision tree induced by the given ranking. These comparisons are used to train a preference-based reward model, which is then used for policy learning. We find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at https://github.com/abukharin3/HERON.Comment: 28 Pages, 15 figure

arXiv.org e-Print Archive

Multi-objective Optimization of Space-Air-Ground Integrated Network Slicing Relying on a Pair of Central and Distributed Learning Algorithms

Author: Hanzo Lajos
Song Shenghui
Zhang Jiankang
Zhao Liqiang
Zheng Gan
Zhou Guorong
Publication venue
Publication date: 22/09/2023
Field of study

As an attractive enabling technology for next-generation wireless communications, network slicing supports diverse customized services in the global space-air-ground integrated network (SAGIN) with diverse resource constraints. In this paper, we dynamically consider three typical classes of radio access network (RAN) slices, namely high-throughput slices, low-delay slices and wide-coverage slices, under the same underlying physical SAGIN. The throughput, the service delay and the coverage area of these three classes of RAN slices are jointly optimized in a non-scalar form by considering the distinct channel features and service advantages of the terrestrial, aerial and satellite components of SAGINs. A joint central and distributed multi-agent deep deterministic policy gradient (CDMADDPG) algorithm is proposed for solving the above problem to obtain the Pareto optimal solutions. The algorithm first determines the optimal virtual unmanned aerial vehicle (vUAV) positions and the inter-slice sub-channel and power sharing by relying on a centralized unit. Then it optimizes the intra-slice sub-channel and power allocation, and the virtual base station (vBS)/vUAV/virtual low earth orbit (vLEO) satellite deployment in support of three classes of slices by three separate distributed units. Simulation results verify that the proposed method approaches the Pareto-optimal exploitation of multiple RAN slices, and outperforms the benchmarkers.Comment: 19 pages, 14 figures, journa

arXiv.org e-Print Archive

Social Simulation of Mobility as a Service in Closed Communities

Author: Catarina Raquel da Silva Ferreira
Publication venue
Publication date: 23/07/2020
Field of study

Repositório Aberto da Universidade do Porto

Exploring the impact of different behaviours on fairness and efficiency in MADRL

Author: Margarida Ramos Pereira Silva
Publication venue
Publication date: 22/07/2021
Field of study

Repositório Aberto da Universidade do Porto