188 research outputs found

    Benchmarking Deep Reinforcement Learning for Continuous Control

    Get PDF
    Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201

    Learning Robot Control using a Hierarchical SOM-based Encoding

    Get PDF
    Hierarchical representations and modeling of sensorimotor observations is a fundamental approach for the development of scalable robot control strategies. Previously, we introduced the novel Hierarchical Self-Organizing Map-based Encoding algorithm (HSOME) that is based on a computational model of infant cognition. Each layer is a temporally augmented SOM and every node updates a decaying activation value. The bottom level encodes sensori-motor instances while their temporal associations are hierarchically built on the layers above. In the past, HSOME has shown to support hierarchical encoding of sequential sensor-actuator observations both in abstract domains and real humanoid robots. Two novel features are presented here starting with the novel skill acquisition in the complex domain of learning a double tap tactile gesture between two humanoid robots. During reproduction, the robot can either perform a double tap or prioritize to receive a higher reward by performing a single tap instead. Secondly, HSOME has been extended to recall past observations and reproduce rhythmic patterns in the absence of input relevant to the joints by priming initially the reproduction of specific skills with an input. We also demonstrate in simulation how a complex behavior emerges from the automatic reuse of distinct oscillatory swimming demonstrations of a robotic salamander

    Drama, a connectionist model for robot learning: experiments on grounding communication through imitation in autonomous robots

    Get PDF
    The present dissertation addresses problems related to robot learning from demonstra¬ tion. It presents the building of a connectionist architecture, which provides the robot with the necessary cognitive and behavioural mechanisms for learning a synthetic lan¬ guage taught by an external teacher agent. This thesis considers three main issues: 1) learning of spatio-temporal invariance in a dynamic noisy environment, 2) symbol grounding of a robot's actions and perceptions, 3) development of a common symbolic representation of the world by heterogeneous agents.We build our approach on the assumption that grounding of symbolic communication creates constraints not only on the cognitive capabilities of the agent but also and especially on its behavioural capacities. Behavioural skills, such as imitation, which allow the agent to co-ordinate its actionn to that of the teacher agent, are required aside to general cognitive abilities of associativity, in order to constrain the agent's attention to making relevant perceptions, onto which it grounds the teacher agent's symbolic expression. In addition, the agent should be provided with the cognitive capacity for extracting spatial and temporal invariance in the continuous flow of its perceptions. Based on this requirement, we develop a connectionist architecture for learning time series. The model is a Dynamical Recurrent Associative Memory Architecture, called DRAMA. It is a fully connected recurrent neural network using Hebbian update rules. Learning is dynamic and unsupervised. The performance of the architecture is analysed theoretically, through numerical simulations and through physical and simulated robotic experiments. Training of the network is computationally fast and inexpensive, which allows its implementation for real time computation and on-line learning in a inexpensive hardware system. Robotic experiments are carried out with different learning tasks involving recognition of spatial and temporal invariance, namely landmark recognition and prediction of perception-action sequence in maze travelling.The architecture is applied to experiments on robot learning by imitation. A learner robot is taught by a teacher agent, a human instructor and another robot, a vocabulary to describe its perceptions and actions. The experiments are based on an imitative strategy, whereby the learner robot reproduces the teacher's actions. While imitating the teacher's movements, the learner robot makes similar proprio and exteroceptions to those of the teacher. The learner robot grounds the teacher's words onto the set of common perceptions they share. We carry out experiments in simulated and physical environments, using different robotic set-ups, increasing gradually the complexity of the task. In a first set of experiments, we study transmission of a vocabulary to designate actions and perception of a robot. Further, we carry out simulation studies, in which we investigate transmission and use of the vocabulary among a group of robotic agents. In a third set of experiments, we investigate learning sequences of the robot's perceptions, while wandering in a physically constrained environment. Finally, we present the implementation of DRAMA in Robota, a doll-like robot, which can imitate the arms and head movements of a human instructor. Through this imitative game, Robota is taught to perform and label dance patterns. Further, Robota is taught a basic language, including a lexicon and syntactical rules for the combination of words of the lexicon, to describe its actions and perception of touch onto its body

    Test moment determination design in active robot learning

    Get PDF
    A thesis submitted to the University of Bedfordshire, in fulfilment of the requirements for the degree of Master of Science by researchIn recent years, service robots have been increasingly used in people's daily live. These robots are autonomous or semiautonomous and are able to cooperate with their human users. Active robot learning (ARL) is an approach to the development of beliefs for the robots on their users' intention and preference, which is needed by the robots to facilitate the seamless cooperation with humans. This approach allows a robot to perform tests on its users and to build up the high-order beliefs according to the users' responses. This study carried out primary research on designing the test moment determination component in ARL framework. The test moment determination component is used to decide right moment of taking a test action. In this study, an action plan theory was suggested to synthesis actions into a sequence, that is, an action plan, for a given task. All actions are defined in a special format of precondition, action, post-condition and testing time. Forward chaining reasoning was introduced to establish connection between the actions and to synthesis individual actions into an action plan, corresponding to the given task. A simulation environment was set up where a human user and a service robot were modelled using MATLAB. Fuzzy control was employed for controlling the robot to carry out the cooperative action. In order to examine the effect of test moment determination component, simulations were performed to execute a scenario where a robot passes on an object to a human user. The simulation results show that an action plan can be formed according to provided conditions and executed by simulated models properly. Test actions were taken at the moment determined by the test moment determination component to find the human user's intention
    corecore