Search CORE

7 research outputs found

An exploration strategy for non-stationary opponents

Author: Hernandez-Leal P. (Pablo)
Munoz de Cote E. (Enrique)
Sucar L.E. (Enrique)
Taylor M.E. (Matthew)
Zhan Y. (Yusen)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2017
Field of study

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains

CWI's Institutional Repository

Towards Continual Reinforcement Learning: A Review and Perspectives

Author: Khetarpal Khimya
Precup Doina
Riemer Matthew
Rish Irina
Publication venue
Publication date: 24/12/2020
Field of study

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure

arXiv.org e-Print Archive

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Author: Baarslag T. (Tim)
Hernandez-Leal P. (Pablo)
Kaisers M. (Michael)
Munoz de Cote E. (Enrique)
Publication venue
Publication date: 28/07/2018
Field of study

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research

CWI's Institutional Repository

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Author: Baarslag T. (Tim)
Hernandez-Leal P. (Pablo)
Kaisers M. (Michael)
Munoz de Cote E. (Enrique)
Publication venue
Publication date: 28/07/2018
Field of study

CWI's Institutional Repository

Cooperation in Games

Author: Damer Steven
Publication venue
Publication date: 01/05/2019
Field of study

University of Minnesota Ph.D. dissertation. 2019. Major: Computer Science. Advisor: Maria Gini. 1 computer file (PDF); 159 pages.This dissertation explores several problems related to social behavior, which is a complex and difficult problem. In this dissertation we describe ways to solve problems for agents interacting with opponents, specifically (1) identifying cooperative strategies,(2) acting on fallible predictions, and (3) determining how much to compromise with the opponent. In a multi-agent environment an agent’s interactions with its opponent can significantly affect its performance. However, it is not always possible for the agent to fully model the behavior of the opponent and compute a best response. We present three algorithms for agents to use when interacting with an opponent too complex to be modelled. An agent which wishes to cooperate with its opponent must first identify what strategy constitutes a cooperative action. We address the problem of identifying cooperative strategies in repeated randomly generated games by modelling an agent’s intentions with a real number, its attitude, which is used to produce a modified game; the Nash equilibria of the modified game implement the strategies described by the intentions used to generate the modified game. We demonstrate how these values can be learned, and show how they can be used to achieve cooperation through reciprocation in repeated randomly generated normal form games. Next, an agent which has formed a prediction of opponent behavior which maybe incorrect needs to be able to take advantage of that prediction without adopting a strategy which is overly vulnerable to exploitation. We have developed Restricted Stackelberg Response with Safety (RSRS), an algorithm which can produce a strategy to respond to a prediction while balancing the priorities of performance against the prediction, worst-case performance, and performance against a best-responding opponent. By balancing those concerns appropriately the agent can perform well against an opponent which it cannot reliably predict. Finally we look at how an agent can manipulate an opponent to choose actions which benefit the agent. This problem is often complicated by the difficulty of analyzing the game the agent is playing. To address this issue, we begin by developing a new game, the Gift Exchange game, which is trivial to analyze; the only question is how the opponent will react. We develop a variety of strategies the agent can use when playing the game, and explore how the best strategy is affected by the agent’s discount factor and prior over opponents

University of Minnesota Digital Conservancy

An exploration strategy for non-stationary opponents

Author: C Watkins
D Carmel
D Chakraborty
Enrique Munoz de Cote
GC Cawley
GE Monahan
I Stahl
JC Gittins
L Busoniu
L. Enrique Sucar
M Bowling
M Puterman
MA Zinkevich
Matthew E. Taylor
ME Taylor
P Auer
P Hernandez-Leal
P Stone
P Vrancx
Pablo Hernandez-Leal
R Axelrod
RI Brafman
Yusen Zhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref