291 research outputs found
A new Potential-Based Reward Shaping for Reinforcement Learning Agent
Potential-based reward shaping (PBRS) is a particular category of machine
learning methods which aims to improve the learning speed of a reinforcement
learning agent by extracting and utilizing extra knowledge while performing a
task. There are two steps in the process of transfer learning: extracting
knowledge from previously learned tasks and transferring that knowledge to use
it in a target task. The latter step is well discussed in the literature with
various methods being proposed for it, while the former has been explored less.
With this in mind, the type of knowledge that is transmitted is very important
and can lead to considerable improvement. Among the literature of both the
transfer learning and the potential-based reward shaping, a subject that has
never been addressed is the knowledge gathered during the learning process
itself. In this paper, we presented a novel potential-based reward shaping
method that attempted to extract knowledge from the learning process. The
proposed method extracts knowledge from episodes' cumulative rewards. The
proposed method has been evaluated in the Arcade learning environment and the
results indicate an improvement in the learning process in both the single-task
and the multi-task reinforcement learner agents
Explainable Action Advising for Multi-Agent Reinforcement Learning
Action advising is a knowledge transfer technique for reinforcement learning
based on the teacher-student paradigm. An expert teacher provides advice to a
student during training in order to improve the student's sample efficiency and
policy performance. Such advice is commonly given in the form of state-action
pairs. However, it makes it difficult for the student to reason with and apply
to novel states. We introduce Explainable Action Advising, in which the teacher
provides action advice as well as associated explanations indicating why the
action was chosen. This allows the student to self-reflect on what it has
learned, enabling advice generalization and leading to improved sample
efficiency and learning performance - even in environments where the teacher is
sub-optimal. We empirically show that our framework is effective in both
single-agent and multi-agent scenarios, yielding improved policy returns and
convergence rates when compared to state-of-the-art methodsComment: This work has been accepted to ICRA 202
Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning
Lately, differential privacy (DP) has been introduced in cooperative
multiagent reinforcement learning (CMARL) to safeguard the agents' privacy
against adversarial inference during knowledge sharing. Nevertheless, we argue
that the noise introduced by DP mechanisms may inadvertently give rise to a
novel poisoning threat, specifically in the context of private knowledge
sharing during CMARL, which remains unexplored in the literature. To address
this shortcoming, we present an adaptive, privacy-exploiting, and
evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the
inherent DP-noise to circumvent anomaly detection systems and hinder the
optimal convergence of the CMARL model. We rigorously evaluate our proposed
PeLPA attack in diverse environments, encompassing both non-adversarial and
multiple-adversarial contexts. Our findings reveal that, in a medium-scale
environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to
an increase in average steps to goal by 50.69% and 64.41%, respectively.
Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x
computational time increase in optimal reward attainment and a 1.18x and 1.38x
slower convergence for attacker ratios of 20% and 40%, respectively.Comment: 6 pages, 4 figures, Published in the proceeding of the ICMLC 2023,
9-11 July 2023, The University of Adelaide, Adelaide, Australi
BRNES: Enabling Security and Privacy-aware Experience Sharing in Multiagent Robotic and Autonomous Systems
Although experience sharing (ES) accelerates multiagent reinforcement
learning (MARL) in an advisor-advisee framework, attempts to apply ES to
decentralized multiagent systems have so far relied on trusted environments and
overlooked the possibility of adversarial manipulation and inference.
Nevertheless, in a real-world setting, some Byzantine attackers, disguised as
advisors, may provide false advice to the advisee and catastrophically degrade
the overall learning performance. Also, an inference attacker, disguised as an
advisee, may conduct several queries to infer the advisors' private information
and make the entire ES process questionable in terms of privacy leakage. To
address and tackle these issues, we propose a novel MARL framework (BRNES) that
heuristically selects a dynamic neighbor zone for each advisee at each learning
step and adopts a weighted experience aggregation technique to reduce Byzantine
attack impact. Furthermore, to keep the agent's private information safe from
adversarial inference attacks, we leverage the local differential privacy
(LDP)-induced noise during the ES process. Our experiments show that our
framework outperforms the state-of-the-art in terms of the steps to goal,
obtained reward, and time to goal metrics. Particularly, our evaluation shows
that the proposed framework is 8.32x faster than the current non-private
frameworks and 1.41x faster than the private frameworks in an adversarial
setting.Comment: 8 pages, 6 figures, 3 tables, Accepted for publication in the
proceeding of The 2023 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2023), Oct 01-05, 2023, Detroit, Michigan, US
- …