9,169 research outputs found
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Stackelberg equilibria arise naturally in a range of popular learning
problems, such as in security games or indirect mechanism design, and have
received increasing attention in the reinforcement learning literature. We
present a general framework for implementing Stackelberg equilibria search as a
multi-agent RL problem, allowing a wide range of algorithmic design choices. We
discuss how previous approaches can be seen as specific instantiations of this
framework. As a key insight, we note that the design space allows for
approaches not previously seen in the literature, for instance by leveraging
multitask and meta-RL techniques for follower convergence. We propose one such
approach using contextual policies, and evaluate it experimentally on both
standard and novel benchmark domains, showing greatly improved sample
efficiency compared to previous approaches. Finally, we explore the effect of
adopting algorithm designs outside the borders of our framework
Intrinsic fluctuations of reinforcement learning promote cooperation
In this work, we ask for and answer what makes classical reinforcement
learning cooperative. Cooperating in social dilemma situations is vital for
animals, humans, and machines. While evolutionary theory revealed a range of
mechanisms promoting cooperation, the conditions under which agents learn to
cooperate are contested. Here, we demonstrate which and how individual elements
of the multi-agent learning setting lead to cooperation. Specifically, we
consider the widely used temporal-difference reinforcement learning algorithm
with epsilon-greedy exploration in the classic environment of an iterated
Prisoner's dilemma with one-period memory. Each of the two learning agents
learns a strategy that conditions the following action choices on both agents'
action choices of the last round. We find that next to a high caring for future
rewards, a low exploration rate, and a small learning rate, it is primarily
intrinsic stochastic fluctuations of the reinforcement learning process which
double the final rate of cooperation to up to 80\%. Thus, inherent noise is not
a necessary evil of the iterative learning process. It is a critical asset for
the learning of cooperation. However, we also point out the trade-off between a
high likelihood of cooperative behavior and achieving this in a reasonable
amount of time. Our findings are relevant for purposefully designing
cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure
Cooperation and Social Dilemmas with Reinforcement Learning
Cooperation between humans has been foundational for the development of civilisation and yet there are many questions about how it emerges from social interactions.
As artificial agents begin to play a more significant role in our lives and are introduced into our societies, it is apparent that understanding the mechanisms of cooperation is important also for the design of next-generation multi-agent AI systems. Indeed, this is particularly important in the case of supporting cooperation between self-interested AI agents.
In this thesis, we focus on the analysis of the application of mechanisms that are at the basis of human cooperation to the training of reinforcement learning agents. Human behaviour is a product of cultural norms, emotions and intuition amongst other things: we argue it is possible to use similar mechanisms to deal with the complexities of multi-agent cooperation. We outline the problem of cooperation in mixed-motive games, also known as social dilemmas, and we focus on the mechanisms of reputation dynamics and partner selection, two mechanisms that have been strongly linked to indirect reciprocity in Evolutionary Game Theory. A key point that we want to emphasise is the fact we assume no prior knowledge and explicit definition of strategies, which instead are fully learnt by the agents during the games.
In our experimental evaluation, we demonstrate the benefits of applying these mechanisms to the training process of the agents, and we compare our findings with results presented in a variety of other disciplines, including Economics and Evolutionary Biology
- …