9,169 research outputs found

    Towards Continual Reinforcement Learning: A Review and Perspectives

    Full text link
    In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure

    Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

    Full text link
    Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework

    Intrinsic fluctuations of reinforcement learning promote cooperation

    Get PDF
    In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80\%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure

    Cooperation and Social Dilemmas with Reinforcement Learning

    Get PDF
    Cooperation between humans has been foundational for the development of civilisation and yet there are many questions about how it emerges from social interactions. As artificial agents begin to play a more significant role in our lives and are introduced into our societies, it is apparent that understanding the mechanisms of cooperation is important also for the design of next-generation multi-agent AI systems. Indeed, this is particularly important in the case of supporting cooperation between self-interested AI agents. In this thesis, we focus on the analysis of the application of mechanisms that are at the basis of human cooperation to the training of reinforcement learning agents. Human behaviour is a product of cultural norms, emotions and intuition amongst other things: we argue it is possible to use similar mechanisms to deal with the complexities of multi-agent cooperation. We outline the problem of cooperation in mixed-motive games, also known as social dilemmas, and we focus on the mechanisms of reputation dynamics and partner selection, two mechanisms that have been strongly linked to indirect reciprocity in Evolutionary Game Theory. A key point that we want to emphasise is the fact we assume no prior knowledge and explicit definition of strategies, which instead are fully learnt by the agents during the games. In our experimental evaluation, we demonstrate the benefits of applying these mechanisms to the training process of the agents, and we compare our findings with results presented in a variety of other disciplines, including Economics and Evolutionary Biology
    • …