5 research outputs found

    Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning

    Full text link
    Adapting to regularities of the environment is critical for biological organisms to anticipate events and plan. A prominent example is the circadian rhythm corresponding to the internalization by organisms of the 2424-hour period of the Earth's rotation. In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents. In particular, we deployed agents in an environment with a reliable periodic variation while solving a foraging task. We systematically characterize the agent's behavior during learning and demonstrate the emergence of a rhythm that is endogenous and entrainable. Interestingly, the internal rhythm adapts to shifts in the phase of the environmental signal without any re-training. Furthermore, we show via bifurcation and phase response curve analyses how artificial neurons develop dynamics to support the internalization of the environmental rhythm. From a dynamical systems view, we demonstrate that the adaptation proceeds by the emergence of a stable periodic orbit in the neuron dynamics with a phase response that allows an optimal phase synchronisation between the agent's dynamics and the environmental rhythm.Comment: ICML 202

    On the Convergence of Bounded Agents

    Full text link
    When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field

    Towards Continual Reinforcement Learning: A Review and Perspectives

    Full text link
    In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure

    Abstractions of General Reinforcement Learning

    Get PDF
    The field of artificial intelligence (AI) is devoted to the creation of artificial decision-makers that can perform (at least) on par with the human counterparts on a domain of interest. Unlike the agents in traditional AI, the agents in artificial general intelligence (AGI) are required to replicate human intelligence in almost every domain of interest. Moreover, an AGI agent should be able to achieve this without (virtually any) further changes, retraining, or fine- tuning of the parameters. The real world is non-stationary, non-ergodic, and non-Markovian: we, humans, can neither revisit our past nor are the most recent observations sufficient statistics to perform optimally. Yet, we excel at a variety of complex tasks. Many of these tasks require long term planning. We can associate this success to our natural faculty to abstract away task-irrelevant information from our overwhelming sensory experience. We make task- specific mental models of the world without much effort. Due to this ability to abstract, we can plan on a significantly compact representation of a task without much loss of performance. Not only this, we also abstract our actions to produce high-level plans: the level of action- abstraction can be anywhere between small muscle movements to a mental notion of "doing an action". It is natural to assume that any AGI agent competing with humans (at every plausible domain) should also have these abilities to abstract its experiences and actions. This thesis is an inquiry into the existence of such abstractions which aid efficient planning for a wide range of domains. And most importantly, these abstractions come with some optimality guarantees. We use a history-based reinforcement learning (RL) setup, appropriately called general reinforcement learning (GRL), to model such general-purpose decision-makers. We show that if such GRL agents have access to appropriate abstractions then they can perform optimally in a huge set of domains. That is, we argue that GRL with abstractions, called abstraction reinforcement learning (ARL), is an appropriate framework to model and analyze AGI agents. This work uses and extends beyond a powerful class of (state-only) abstractions called extreme state abstractions (ESA). We analyze a variety of such extreme abstractions, both state-only and state-action abstractions, to formally establish the representation and convergence guarantees. We also make many minor contributions to the ARL framework along the way. Last but not least, we collect a series of ideas that lay the foundations for designing the (extreme) abstraction learning algorithms
    corecore