69 research outputs found
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning
While reinforcement learning (RL) algorithms have been successfully applied
to numerous tasks, their reliance on neural networks makes their behavior
difficult to understand and trust. Counterfactual explanations are
human-friendly explanations that offer users actionable advice on how to alter
the model inputs to achieve the desired output from a black-box system.
However, current approaches to generating counterfactuals in RL ignore the
stochastic and sequential nature of RL tasks and can produce counterfactuals
that are difficult to obtain or do not deliver the desired outcome. In this
work, we propose RACCER, the first RL-specific approach to generating
counterfactual explanations for the behavior of RL agents. We first propose and
implement a set of RL-specific counterfactual properties that ensure easily
reachable counterfactuals with highly probable desired outcomes. We use a
heuristic tree search of the agent's execution trajectories to find the most
suitable counterfactuals based on the defined properties. We evaluate RACCER in
two tasks as well as conduct a user study to show that RL-specific
counterfactuals help users better understand agents' behavior compared to the
current state-of-the-art approaches.Comment: 10 pages, 3 figures, 3 table
Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning
Transfer learning in Reinforcement Learning (RL) has been widely studied to
overcome training issues of Deep-RL, i.e., exploration cost, data availability
and convergence time, by introducing a way to enhance training phase with
external knowledge. Generally, knowledge is transferred from expert-agents to
novices. While this fixes the issue for a novice agent, a good understanding of
the task on expert agent is required for such transfer to be effective. As an
alternative, in this paper we propose Expert-Free Online Transfer Learning
(EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer
learning in multi-agent system. No dedicated expert exists, and transfer source
agent and knowledge to be transferred are dynamically selected at each transfer
step based on agents' performance and uncertainty. To improve uncertainty
estimation, we also propose State Action Reward Next-State Random Network
Distillation (sars-RND), an extension of RND that estimates uncertainty from RL
agent-environment interaction. We demonstrate EF-OnTL effectiveness against a
no-transfer scenario and advice-based baselines, with and without expert
agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team
Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that
EF-OnTL achieve overall comparable performance when compared against
advice-based baselines while not requiring any external input nor threshold
tuning. EF-OnTL outperforms no-transfer with an improvement related to the
complexity of the task addressed
Deep W-Networks: Solving Multi-Objective Optimisation Problems With Deep Reinforcement Learning
In this paper, we build on advances introduced by the Deep Q-Networks (DQN)
approach to extend the multi-objective tabular Reinforcement Learning (RL)
algorithm W-learning to large state spaces. W-learning algorithm can naturally
solve the competition between multiple single policies in multi-objective
environments. However, the tabular version does not scale well to environments
with large state spaces. To address this issue, we replace underlying Q-tables
with DQN, and propose an addition of W-Networks, as a replacement for tabular
weights (W) representations. We evaluate the resulting Deep W-Networks (DWN)
approach in two widely-accepted multi-objective RL benchmarks: deep sea
treasure and multi-objective mountain car. We show that DWN solves the
competition between multiple policies while outperforming the baseline in the
form of a DQN solution. Additionally, we demonstrate that the proposed
algorithm can find the Pareto front in both tested environments
Causal Counterfactuals for Improving the Robustness of Reinforcement Learning
Reinforcement learning (RL) is applied in a wide variety of fields. RL
enables agents to learn tasks autonomously by interacting with the environment.
The more critical the tasks are, the higher the demand for the robustness of
the RL systems. Causal RL combines RL and causal inference to make RL more
robust. Causal RL agents use a causal representation to capture the invariant
causal mechanisms that can be transferred from one task to another. Currently,
there is limited research in Causal RL, and existing solutions are usually not
complete or feasible for real-world applications. In this work, we propose
CausalCF, the first complete Causal RL solution incorporating ideas from Causal
Curiosity and CoPhy. Causal Curiosity provides an approach for using
interventions, and CoPhy is modified to enable the RL agent to perform
counterfactuals. We apply CausalCF to complex robotic tasks and show that it
improves the RL agent's robustness using a realistic simulation environment
called CausalWorld.Comment: Submission to ARMS-2023 (ARMS-2023: AAMAS 2023 Workshop on Autonomous
Robots and Multirobot Systems
Prevalence of Code Smells in Reinforcement Learning Projects
Reinforcement Learning (RL) is being increasingly used to learn and adapt
application behavior in many domains, including large-scale and safety critical
systems, as for example, autonomous driving. With the advent of plug-n-play RL
libraries, its applicability has further increased, enabling integration of RL
algorithms by users. We note, however, that the majority of such code is not
developed by RL engineers, which as a consequence, may lead to poor program
quality yielding bugs, suboptimal performance, maintainability, and evolution
problems for RL-based projects. In this paper we begin the exploration of this
hypothesis, specific to code utilizing RL, analyzing different projects found
in the wild, to assess their quality from a software engineering perspective.
Our study includes 24 popular RL-based Python projects, analyzed with standard
software engineering metrics. Our results, aligned with similar analyses for ML
code in general, show that popular and widely reused RL repositories contain
many code smells (3.95% of the code base on average), significantly affecting
the projects' maintainability. The most common code smells detected are long
method and long method chain, highlighting problems in the definition and
interaction of agents. Detected code smells suggest problems in responsibility
separation, and the appropriateness of current abstractions for the definition
of RL algorithms.Comment: Paper preprint for the 2nd International Conference on AI Engineering
Software Engineering for AI CAIN202
- …