202 research outputs found

    Embodying a Computational Model of Hippocampal Replay for Robotic Reinforcement Learning

    Get PDF
    Hippocampal reverse replay has been speculated to play an important role in biological reinforcement learning since its discovery over a decade ago. Whilst a number of computational models have recently emerged in an attempt to understand the dynamics of hippocampal replay, there has been little progress in testing and implementing these models in real-world robotics settings. Presented first in this body of work then is a bio-inspired hippocampal CA3 network model. It runs in real-time to produce reverse replays of recent spatio-temporal sequences, represented as place cell activities, in a robotic spatial navigation task. The model is based on two very recent computational models of hippocampal reverse replay. An analysis of these models show that, in their original forms, they are each insufficient for effective performance when applied to a robot. As such, choosing particular elements from each allows for a computational model that is sufficient for application in a robotic task. Having a model of reverse replay applied successfully in a robot provides the groundwork necessary for testing the ways in which reverse replay contributes to reinforcement learning. The second portion of the work presented here builds on a previous reinforcement learning neural network model of a basic hippocampal-striatal circuit using a three-factor learning rule. By integrating reverse replays into this reinforcement learning model, results show that reverse replay, with its ability to replay the recent trajectory both in the hippocampal circuit and the striatal circuit, can speed up the learning process. In addition, for situations where the original reinforcement learning model performs poorly, such as when its time dynamics do not sufficiently store enough of the robot's behavioural history for effective learning, the reverse replay model can compensate for this by replaying the recent history. These results are inline with experimental findings showing that disruption of awake hippocampal replay events severely diminishes, but does not entirely eliminate, reinforcement learning. This work provides possible insights into the important role that reverse replays could contribute to mnemonic function, and reinforcement learning in particular; insights that could benefit the robotic, AI, and neuroscience communities. However, there is still much to be done. How reverse replays are initiated is still an ongoing research problem, for instance. Furthermore, the model presented here generates place cells heuristically, but there are computational models tackling the problem of how hippocampal cells such as place cells, but also grid cells and head direction cells, emerge. This leads to the pertinent question of asking how these models, which make assumptions about their network architectures and dynamics, could integrate with the computational models of hippocampal replay which make their own assumptions on network architectures and dynamics

    Efficient Learning and Inference for High-dimensional Lagrangian Systems

    Get PDF
    Learning the nature of a physical system is a problem that presents many challenges and opportunities owing to the unique structure associated with such systems. Many physical systems of practical interest in engineering are high-dimensional, which prohibits the application of standard learning methods to such problems. This first part of this work proposes therefore to solve learning problems associated with physical systems by identifying their low-dimensional Lagrangian structure. Algorithms are given to learn this structure in the case that it is obscured by a change of coordinates. The associated inference problem corresponds to solving a high-dimensional minimum-cost path problem, which can be solved by exploiting the symmetry of the problem. These techniques are demonstrated via an application to learning from high-dimensional human motion capture data. The second part of this work is concerned with the application of these methods to high-dimensional motion planning. Algorithms are given to learn and exploit the struc- ture of holonomic motion planning problems effectively via spectral analysis and iterative dynamic programming, admitting solutions to problems of unprecedented dimension com- pared to known methods for optimal motion planning. The quality of solutions found is also demonstrated to be much superior in practice to those obtained via sampling-based planning and smoothing, in both simulated problems and experiments with a robot arm. This work therefore provides strong validation of the idea that learning low-dimensional structure is the key to future advances in this field

    Learning predictive cognitive maps with spiking neurons during behaviour and replays

    Get PDF
    The hippocampus has been proposed to encode environments using a representation that contains predictive information about likely future states, called the successor representation. However, it is not clear how such a representation could be learned in the hippocampal circuit. Here, we propose a plasticity rule that can learn this predictive map of the environment using a spiking neural network. We connect this biologically plausible plasticity rule to reinforcement learning, mathematically and numerically showing that it implements the TD-lambda algorithm. By spanning these different levels, we show how our framework naturally encompasses behavioral activity and replays, smoothly moving from rate to temporal coding, and allows learning over behavioral timescales with a plasticity rule acting on a timescale of milliseconds. We discuss how biological parameters such as dwelling times at states, neuronal firing rates and neuromodulation relate to the delay discounting parameter of the TD algorithm, and how they influence the learned representation. We also find that, in agreement with psychological studies and contrary to reinforcement learning theory, the discount factor decreases hyperbolically with time. Finally, our framework suggests a role for replays, in both aiding learning in novel environments and finding shortcut trajectories that were not experienced during behavior, in agreement with experimental data

    EVALUATING ARTIFICIAL INTELLIGENCE METHODS FOR USE IN KILL CHAIN FUNCTIONS

    Get PDF
    Current naval operations require sailors to make time-critical and high-stakes decisions based on uncertain situational knowledge in dynamic operational environments. Recent tragic events have resulted in unnecessary casualties, and they represent the decision complexity involved in naval operations and specifically highlight challenges within the OODA loop (Observe, Orient, Decide, and Assess). Kill chain decisions involving the use of weapon systems are a particularly stressing category within the OODA loop—with unexpected threats that are difficult to identify with certainty, shortened decision reaction times, and lethal consequences. An effective kill chain requires the proper setup and employment of shipboard sensors; the identification and classification of unknown contacts; the analysis of contact intentions based on kinematics and intelligence; an awareness of the environment; and decision analysis and resource selection. This project explored the use of automation and artificial intelligence (AI) to improve naval kill chain decisions. The team studied naval kill chain functions and developed specific evaluation criteria for each function for determining the efficacy of specific AI methods. The team identified and studied AI methods and applied the evaluation criteria to map specific AI methods to specific kill chain functions.Civilian, Department of the NavyCivilian, Department of the NavyCivilian, Department of the NavyCaptain, United States Marine CorpsCivilian, Department of the NavyCivilian, Department of the NavyApproved for public release. Distribution is unlimited

    Learning and planning in videogames via task decomposition

    Get PDF
    Artificial intelligence (AI) methods have come a long way in tabletop games, with computer programs having now surpassed human experts in the challenging games of chess, Go and heads-up no-limit Texas hold'em. However, a significant simplifying factor in these games is that individual decisions have a relatively large impact on the state of the game. The real world, however, is granular. Human beings are continually presented with new information and are faced with making a multitude of tiny decisions every second. Viewed in these terms, feedback is often sparse, meaning that it only arrives after one has made a great number of decisions. Moreover, in many real-world problems there is a continuous range of actions to choose from, and attaining meaningful feedback from the environment often requires a strong degree of action coordination. Videogames, in which players must likewise contend with granular time scales and continuous action spaces, are in this sense a better proxy for real-world problems, and have thus become regarded by many as the new frontier in games AI. Seemingly, the way in which human players approach granular decision-making in videogames is by decomposing complex tasks into high-level subproblems, thereby allowing them to focus on the "big picture". For example, in Super Mario World, human players seem to look ahead in extended steps, such as climbing a vine or jumping over a pit, rather than planning one frame at a time. Currently though, this type of reasoning does not come easily to machines, leaving many open research problems related to task decomposition. This thesis focuses on three such problems in particular: (1) The challenge of learning subgoals autonomously, so as to lessen the issue of sparse feedback. (2) The challenge of combining discrete planning techniques with extended actions whose durations and effects on the environment are uncertain. (3) The questions of when and why it is beneficial to reason over high-level continuous control variables, such as the velocity of a player-controlled ship, rather than over the most low-level actions available. We address these problems via new algorithms and novel experimental design, demonstrating empirically that our algorithms are more efficient than strong baselines that do not leverage task decomposition, and yielding insight into the types of environment where task decomposition is likely to be beneficial

    The Aha! Experience of Spatial Reorientation

    Get PDF
    The experience of spatial re-orientation is investigated as an instance of the wellknown phenomenon of the Aha! moment. The research question is: What are the visuospatial conditions that are most likely to trigger the spatial Aha! experience? The literature suggests that spatial re-orientation relies mainly on the geometry of the environment and a visibility graph analysis is used to quantify the visuospatial information. Theories from environmental psychology point towards two hypotheses. The Aha! experience may be triggered by a change in the amount of visual information, described by the isovist properties of area and revelation, or by a change in the complexity of the visual information associated with the isovist properties of clustering coefficient and visual control. Data from participants’ exploratory behaviour and EEG recordings are collected during wayfinding in virtual reality urban environments. Two types of events are of interest here: (a) sudden changes of the visuospatial information preceding subjects' response to investigate changes in EEG power; and (b) participants brain dynamics (Aha! effect) just before the response to examine differences in isovist values at this location. Research on insights, time-frequency analysis of the P3 component and findings from navigation and orientation studies suggest that the spatial Aha! experience may be reflected by: a parietal alpha power decrease associated with the switch of the representation and a frontocentral theta increase indexing spatial processing during decision-making. Single-trial time-frequency analysis is used to classify trials into two conditions based on the alpha/theta power differences between a 3s time-period before participants’ response and a time-period of equal duration before that. Behavioural results show that participants are more likely to respond at locations with low values of clustering coefficient and high values of visual control. The EEG analysis suggests that the alpha decrease/theta increase condition occurs at locations with significantly lower values of clustering coefficient and higher values of visual control. Small and large decreases in clustering coefficient, just before the response, are associated with significant differences in delta/theta power. The values of area and revelation do not show significant differences. Both behavioural and EEG results suggest that the Aha! experience of re-orientation is more likely to be triggered by a change in the complexity of the visual-spatial environment rather than a change in the amount, as measured by the relevant isovist properties

    Designing Tools for the Invisible Art of Game Feel

    Get PDF

    Non-determinism in the narrative structure of video games

    Get PDF
    PhD ThesisAt the present time, computer games represent a finite interactive system. Even in their more experimental forms, the number of possible interactions between player and NPCs (non-player characters) and among NPCs and the game world has a finite number and is led by a deterministic system in which events can therefore be predicted. This implies that the story itself, seen as the series of events that will unfold during gameplay, is a closed system that can be predicted a priori. This study looks beyond this limitation, and identifies the elements needed for the emergence of a non-finite, emergent narrative structure. Two major contributions are offered through this research. The first contribution comes in the form of a clear categorization of the narrative structures embracing all video game production since the inception of the medium. In order to look for ways to generate a non-deterministic narrative in games, it is necessary to first gain a clear understanding of the current narrative structures implemented and how their impact on users’ experiencing of the story. While many studies have observed the storytelling aspect, no attempt has been made to systematically distinguish among the different ways designers decide how stories are told in games. The second contribution is guided by the following research question: Is it possible to incorporate non-determinism into the narrative structure of computer games? The hypothesis offered is that non-determinism can be incorporated by means of nonlinear dynamical systems in general and Cellular Automata in particular
    • …
    corecore