On Reducing Undesirable Behavior in Deep Reinforcement Learning Models

Carmel, Ophir; Katz, Guy

On Reducing Undesirable Behavior in Deep Reinforcement Learning Models

Authors: Ophir Carmel
Guy Katz
Publication date: 6 September 2023
Publisher

Abstract

Deep reinforcement learning (DRL) has proven extremely useful in a large variety of application domains. However, even successful DRL-based software can exhibit highly undesirable behavior. This is due to DRL training being based on maximizing a reward function, which typically captures general trends but cannot precisely capture, or rule out, certain behaviors of the system. In this paper, we propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software, while maintaining its excellent performance. In addition, our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior. Under the hood, our approach is based on extracting decision tree classifiers from erroneous state-action pairs, and then integrating these trees into the DRL training loop, penalizing the system whenever it performs an error. We provide a proof-of-concept implementation of our approach, and use it to evaluate the technique on three significant case studies. We find that our approach can extend existing frameworks in a straightforward manner, and incurs only a slight overhead in training time. Further, it incurs only a very slight hit to performance, or even in some cases - improves it, while significantly reducing the frequency of undesirable behavior

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2309.02869

Last time updated on 12/09/2023