75,949 research outputs found

    Functions that Emerge through End-to-End Reinforcement Learning - The Direction for Artificial General Intelligence -

    Full text link
    Recently, triggered by the impressive results in TV-games or game of Go by Google DeepMind, end-to-end reinforcement learning (RL) is collecting attentions. Although little is known, the author's group has propounded this framework for around 20 years and already has shown various functions that emerge in a neural network (NN) through RL. In this paper, they are introduced again at this timing. "Function Modularization" approach is deeply penetrated subconsciously. The inputs and outputs for a learning system can be raw sensor signals and motor commands. "State space" or "action space" generally used in RL show the existence of functional modules. That has limited reinforcement learning to learning only for the action-planning module. In order to extend reinforcement learning to learning of the entire function on a huge degree of freedom of a massively parallel learning system and to explain or develop human-like intelligence, the author has believed that end-to-end RL from sensors to motors using a recurrent NN (RNN) becomes an essential key. Especially in the higher functions, this approach is very effective by being free from the need to decide their inputs and outputs. The functions that emerge, we have confirmed, through RL using a NN cover a broad range from real robot learning with raw camera pixel inputs to acquisition of dynamic functions in a RNN. Those are (1)image recognition, (2)color constancy (optical illusion), (3)sensor motion (active recognition), (4)hand-eye coordination and hand reaching movement, (5)explanation of brain activities, (6)communication, (7)knowledge transfer, (8)memory, (9)selective attention, (10)prediction, (11)exploration. The end-to-end RL enables the emergence of very flexible comprehensive functions that consider many things in parallel although it is difficult to give the boundary of each function clearly.Comment: The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2017, 5 pages, 4 figure

    New Reinforcement Learning Using a Chaotic Neural Network for Emergence of "Thinking" - "Exploration" Grows into "Thinking" through Learning -

    Full text link
    Expectation for the emergence of higher functions is getting larger in the framework of end-to-end reinforcement learning using a recurrent neural network. However, the emergence of "thinking" that is a typical higher function is difficult to realize because "thinking" needs non fixed-point, flow-type attractors with both convergence and transition dynamics. Furthermore, in order to introduce "inspiration" or "discovery" in "thinking", not completely random but unexpected transition should be also required. By analogy to "chaotic itinerancy", we have hypothesized that "exploration" grows into "thinking" through learning by forming flow-type attractors on chaotic random-like dynamics. It is expected that if rational dynamics are learned in a chaotic neural network (ChNN), coexistence of rational state transition, inspiration-like state transition and also random-like exploration for unknown situation can be realized. Based on the above idea, we have proposed new reinforcement learning using a ChNN as an actor. The positioning of exploration is completely different from the conventional one. The chaotic dynamics inside the ChNN produces exploration factors by itself. Since external random numbers for stochastic action selection are not used, exploration factors cannot be isolated from the output. Therefore, the learning method is also completely different from the conventional one. At each non-feedback connection, one variable named causality trace takes in and maintains the input through the connection according to the change in its output. Using the trace and TD error, the weight is updated. In this paper, as the result of a recent simple task to see whether the new learning works or not, it is shown that a robot with two wheels and two visual sensors reaches a target while avoiding an obstacle after learning though there are still many rooms for improvement.Comment: The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2017, 5 pages, 6 figure

    GalaxyNet: Connecting galaxies and dark matter haloes with deep neural networks and reinforcement learning in large volumes

    Full text link
    We present the novel wide & deep neural network GalaxyNet, which connects the properties of galaxies and dark matter haloes, and is directly trained on observed galaxy statistics using reinforcement learning. The most important halo properties to predict stellar mass and star formation rate (SFR) are halo mass, growth rate, and scale factor at the time the mass peaks, which results from a feature importance analysis with random forests. We train different models with supervised learning to find the optimal network architecture. GalaxyNet is then trained with a reinforcement learning approach: for a fixed set of weights and biases, we compute the galaxy properties for all haloes and then derive mock statistics (stellar mass functions, cosmic and specific SFRs, quenched fractions, and clustering). Comparing these statistics to observations we get the model loss, which is minimised with particle swarm optimisation. GalaxyNet reproduces the observed data very accurately (χred=1.05\chi_\mathrm{red}=1.05), and predicts a stellar-to-halo mass relation with a lower normalisation and shallower low-mass slope at high redshift than empirical models. We find that at low mass, the galaxies with the highest SFRs are satellites, although most satellites are quenched. The normalisation of the instantaneous conversion efficiency increases with redshift, but stays constant above z≳0.7z\gtrsim0.7. Finally, we use GalaxyNet to populate a cosmic volume of (5.9 Gpc)3(5.9~\mathrm{Gpc})^3 with galaxies and predict the BAO signal, the bias, and the clustering of active and passive galaxies up to z=4z=4, which can be tested with next-generation surveys, such as LSST and Euclid.Comment: 21 pages, 21 figures, 6 tables, submitte

    Hierarchical Reinforcement Learning for Quadruped Locomotion

    Full text link
    Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot's on-board sensors to control the robot's actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to a different task by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task

    Emergence of Locomotion Behaviours in Rich Environments

    Full text link
    The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx_bgoTF7bs

    Improving Coordination in Small-Scale Multi-Agent Deep Reinforcement Learning through Memory-driven Communication

    Full text link
    Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance in small-scale systems. Our experimental results show that the proposed method achieves superior performance in scenarios with up to six agents. We illustrate how different communication patterns can emerge on six different tasks of increasing complexity. Furthermore, we study the effects of corrupting the communication channel, provide a visualisation of the time-varying memory content as the underlying task is being solved and validate the building blocks of the proposed memory device through ablation studies

    Latent Space Policies for Hierarchical Reinforcement Learning

    Full text link
    We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.Comment: ICML 2018; Videos: https://sites.google.com/view/latent-space-deep-rl Code: https://github.com/haarnoja/sa

    Social Learning Methods in Board Games

    Full text link
    This paper discusses the effects of social learning in training of game playing agents. The training of agents in a social context instead of a self-play environment is investigated. Agents that use the reinforcement learning algorithms are trained in social settings. This mimics the way in which players of board games such as scrabble and chess mentor each other in their clubs. A Round Robin tournament and a modified Swiss tournament setting are used for the training. The agents trained using social settings are compared to self play agents and results indicate that more robust agents emerge from the social training setting. Higher state space games can benefit from such settings as diverse set of agents will have multiple strategies that increase the chances of obtaining more experienced players at the end of training. The Social Learning trained agents exhibit better playing experience than self play agents. The modified Swiss playing style spawns a larger number of better playing agents as the population size increases.Comment: 6 page

    Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

    Full text link
    Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter eta to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma (IPD), LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the IPD.Comment: 9 pages, 4 figure

    Universal Planning Networks

    Full text link
    A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities.Comment: Videos available at https://sites.google.com/view/upn-public/hom
    • …
    corecore