16 research outputs found

    Concurrent Skill Composition using Ensemble of Primitive Skills

    Get PDF
    One of the key characteristics of an open-ended cumulative learning agent is that it should use the knowledge gained from prior learning to solve future tasks. That characteristic is especially essential in robotics, as learning every perception-action skill from scratch is not only time consuming but may not always be feasible. In the case of reinforcement learning, this learned knowledge is called a policy. The lifelong learning agent should treat the policies of learned tasks as building blocks to solve those future tasks. One of the categorizations of tasks is based on its composition, ranging from primitive tasks to compound tasks that are either a sequential or concurrent combination of primitive tasks. Thus, the agent needs to be able to combine the policies of the primitive tasks to solve compound tasks, which are then added to its knowledge base. Inspired by modular neural networks, we propose an approach to compose policies for compound tasks that are concurrent combinations of disjoint tasks. Furthermore, we hypothesize that learning in a specialized environment leads to more efficient learning; hence, we create scaffolded environments for the robot to learn primitive skills for our mobile robot-based experiments. We then show how the agent can combine those primitive skills to learn solutions for compound tasks. That reduces the overall training time of multiple skills and creates a versatile agent that can mix and match the skills.</p

    Toward Computational Motivation for Multi-Agent Systems and Swarms

    Get PDF
    Motivation is a crucial part of animal and human mental development, fostering competence, autonomy, and open-ended development. Motivational constructs have proved to be an integral part of explaining human and animal behavior. Computer scientists have proposed various computational models of motivation for artificial agents, with the aim of building artificial agents capable of autonomous goal generation. Multi-agent systems and swarm intelligence are natural extensions to the individual agent setting. However, there are only a few works that focus on motivation theories in multi-agent or swarm settings. In this study, we review current computational models of motivation settings, mechanisms, functions and evaluation methods and discuss how we can produce systems with new kinds of functions not possible using individual agents. We describe in detail this open area of research and the major research challenges it holds

    Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

    Get PDF
    Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments

    Motivated Agents

    No full text
    Agents are systems capable of perceiving their environment through sensors, reasoning about their sensory input using some characteristic reasoning process and acting in the world using their effectors. There exists a wide range of specific agent models that fit this general description. For example, learning agents are agents whose characteristic reasoning process is a learning algorithm. Planning agents are agents whose primary component is a planner. Cognitive agents are agents whose processes model those of the human mind. Agent models are generally context-free. That is, they are designed to be independent of the motor and perceptual system of the agent. This reduces the work required to place agents conforming to a particular model in a new domain. Unfortunately, introducing an agent to a particular problem domain usually requires extensive preparation of other information in order for the agent to function in that domain. For example, necessary preparation may include the definition of domain specific reward functions, goals, world models or examples of correct behaviour. With such extensive domain specific preparation are the resulting agents truly autonomous? Is it possible to build agents that do not require such extensive domain specific preparation? What general mechanisms and structures would such agents need to perform useful tasks in any domain? In this document we propose a course of research to develop and evaluate approaches to agent design which reduce o

    Motion Behaviour Recognition Dataset Collected from Human Perception of Collective Motion

    No full text
    Collective motion behaviour such as the movement of swarming bees, flocking birds or schooling fish has inspired computer-based swarming systems. They are widely used in agent formation control, including aerial and ground vehicles, teams of rescue robots, and exploration of dangerous environments with groups of robots. Collective motion behaviour is easy to describe, but highly subjective to detect. Humans can easily recognise these behaviours; however, it is hard for a computer system to recognise them. Since humans can easily recognise these behaviours, ground truth data from human perception is one way to enable machine learning methods to mimic this human perception. Hence ground truth data has been collected from human perception of collective motion behaviour recognition by running an online survey. In this survey, participants provide their opinion about the behaviour of ‘boid’ point masses. Each question of the survey contains a short video (around 10 seconds), captured from simulated boid movements. Participants were asked to drag a slider to label each video as either ‘flocking’ or ‘not flocking’; ‘aligned’ or ‘not aligned’ or ‘grouped’ or ‘not grouped’. By averaging these responses, three binary labels were created for each video. This data has been analysed to confirm that it is possible for a machine to learn binary classification labels from the human perception of collective behaviour dataset with high accuracy

    A Mechanism for Transferring Evolved Collective Motion Behaviour Libraries onto Real Collective Robots

    No full text
    Evolutionary computation algorithms are heuristic techniques that can find multiple good solutions to solve complex problems. A recent, emerging application of evolutionary computation is the evolution of behaviour libraries for collective robots. However, a limitation of these approaches is that behaviours are evolved under simple, simulated conditions that differ from the dynamics of real robots. This paper proposes a mechanism for transferring evolved behaviours onto real collective robots and demonstrates that the robots exhibit collective motion characteristics consistent with the evolved simulated behaviours. We show that this library includes a greater number of robust and more diverse collective motion behaviours than what was possible with existing techniques of collective motion tuning
    corecore