6,222 research outputs found

    Projective simulation for artificial intelligence

    Get PDF
    We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.Comment: 22 pages, 18 figures. Close to published version, with footnotes retaine

    Speeding-up the decision making of a learning agent using an ion trap quantum processor

    Full text link
    We report a proof-of-principle experimental demonstration of the quantum speed-up for learning agents utilizing a small-scale quantum information processor based on radiofrequency-driven trapped ions. The decision-making process of a quantum learning agent within the projective simulation paradigm for machine learning is implemented in a system of two qubits. The latter are realized using hyperfine states of two frequency-addressed atomic ions exposed to a static magnetic field gradient. We show that the deliberation time of this quantum learning agent is quadratically improved with respect to comparable classical learning agents. The performance of this quantum-enhanced learning agent highlights the potential of scalable quantum processors taking advantage of machine learning.Comment: 21 pages, 7 figures, 2 tables. Author names now spelled correctly; sections rearranged; changes in the wording of the manuscrip

    Scalable Recollections for Continual Lifelong Learning

    Full text link
    Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts primarily on the first two capabilities while questions of efficiency remain largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are O(n) bits and memories are limited to O(k) bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201

    Modeling the mobility of living organisms in heterogeneous landscapes: Does memory improve foraging success?

    Full text link
    Thanks to recent technological advances, it is now possible to track with an unprecedented precision and for long periods of time the movement patterns of many living organisms in their habitat. The increasing amount of data available on single trajectories offers the possibility of understanding how animals move and of testing basic movement models. Random walks have long represented the main description for micro-organisms and have also been useful to understand the foraging behaviour of large animals. Nevertheless, most vertebrates, in particular humans and other primates, rely on sophisticated cognitive tools such as spatial maps, episodic memory and travel cost discounting. These properties call for other modeling approaches of mobility patterns. We propose a foraging framework where a learning mobile agent uses a combination of memory-based and random steps. We investigate how advantageous it is to use memory for exploiting resources in heterogeneous and changing environments. An adequate balance of determinism and random exploration is found to maximize the foraging efficiency and to generate trajectories with an intricate spatio-temporal order. Based on this approach, we propose some tools for analysing the non-random nature of mobility patterns in general.Comment: 14 pages, 4 figures, improved discussio

    Exploring Restart Distributions

    Get PDF
    We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.Comment: RLDM 201

    Capturing Regular Human Activity through a Learning Context Memory

    Get PDF
    A learning context memory consisting of two main parts is presented. The first part performs lossy data compression, keeping the amount of stored data at a minimum by combining similar context attributes ā€” the compression rate for the presented GPS data is 150:1 on average. The resulting data is stored in an appropriate data structure highlighting the level of compression. Elements with a high level of compression are used in the second part to form the start and end points of episodes capturing common activity consisting of consecutive events. The context memory is used to investigate how little context data can be stored containing still enough information to capture regular human activity

    Reward prediction error and declarative memory

    Get PDF
    Learning based on reward prediction error (RPE) was originally proposed in the context of nondeclarative memory. We postulate that RPE may support declarative memory as well. Indeed, recent years have witnessed a number of independent empirical studies reporting effects of RPE on declarative memory. We provide a brief overview of these studies, identify emerging patterns, and discuss open issues such as the role of signed versus unsigned RPEs in declarative learning
    • ā€¦
    corecore