26 research outputs found
Multiagent Learning Through Indirect Encoding
Designing a system of multiple, heterogeneous agents that cooperate to achieve a common goal is a difficult task, but it is also a common real-world problem. Multiagent learning addresses this problem by training the team to cooperate through a learning algorithm. However, most traditional approaches treat multiagent learning as a combination of multiple single-agent learning problems. This perspective leads to many inefficiencies in learning such as the problem of reinvention, whereby fundamental skills and policies that all agents should possess must be rediscovered independently for each team member. For example, in soccer, all the players know how to pass and kick the ball, but a traditional algorithm has no way to share such vital information because it has no way to relate the policies of agents to each other. In this dissertation a new approach to multiagent learning that seeks to address these issues is presented. This approach, called multiagent HyperNEAT, represents teams as a pattern of policies rather than individual agents. The main idea is that an agent’s location within a canonical team layout (such as a soccer team at the start of a game) tends to dictate its role within that team, called the policy geometry. For example, as soccer positions move from goal to center they become more offensive and less defensive, a concept that is compactly represented as a pattern. iii The first major contribution of this dissertation is a new method for evolving neural network controllers called HyperNEAT, which forms the foundation of the second contribution and primary focus of this work, multiagent HyperNEAT. Multiagent learning in this dissertation is investigated in predator-prey, room-clearing, and patrol domains, providing a real-world context for the approach. Interestingly, because the teams in multiagent HyperNEAT are represented as patterns they can scale up to an infinite number of multiagent policies that can be sampled from the policy geometry as needed. Thus the third contribution is a method for teams trained with multiagent HyperNEAT to dynamically scale their size without further learning. Fourth, the capabilities to both learn and scale in multiagent HyperNEAT are compared to the traditional multiagent SARSA(λ) approach in a comprehensive study. The fifth contribution is a method for efficiently learning and encoding multiple policies for each agent on a team to facilitate learning in multi-task domains. Finally, because there is significant interest in practical applications of multiagent learning, multiagent HyperNEAT is tested in a real-world military patrolling application with actual Khepera III robots. The ultimate goal is to provide a new perspective on multiagent learning and to demonstrate the practical benefits of training heterogeneous, scalable multiagent teams through generative encoding
Worldwide Infrastructure for Neuroevolution: A Modular Library to Turn Any Evolutionary Domain into an Online Interactive Platform
Across many scientific disciplines, there has emerged an open opportunity to utilize the scale and reach of the Internet to collect scientific contributions from scientists and non-scientists alike. This process, called citizen science, has already shown great promise in the fields of biology and astronomy. Within the fields of artificial life (ALife) and evolutionary computation (EC) experiments in collaborative interactive evolution (CIE) have demonstrated the ability to collect thousands of experimental contributions from hundreds of users across the glob. However, such collaborative evolutionary systems can take nearly a year to build with a small team of researchers. This dissertation introduces a new developer framework enabling researchers to easily build fully persistent online collaborative experiments around almost any evolutionary domain, thereby reducing the time to create such systems to weeks for a single researcher. To add collaborative functionality to any potential domain, this framework, called Worldwide Infrastructure for Neuroevolution (WIN), exploits an important unifying principle among all evolutionary algorithms: regardless of the overall methods and parameters of the evolutionary experiment, every individual created has an explicit parent-child relationship, wherein one individual is considered the direct descendant of another. This principle alone is enough to capture and preserve the relationships and results for a wide variety of evolutionary experiments, while allowing multiple human users to meaningfully contribute. The WIN framework is first validated through two experimental domains, image evolution and a new two-dimensional virtual creature domain, Indirectly Encoded SodaRace (IESoR), that is shown to produce a visually diverse variety of ambulatory creatures. Finally, an Android application built with WIN, filters, allows users to interactively evolve custom image effects to apply to personalized photographs, thereby introducing the first CIE application available for any mobile device. Together, these collaborative experiments and new mobile application establish a comprehensive new platform for evolutionary computation that can change how researchers design and conduct citizen science online
Algebraic Neural Architecture Representation, Evolutionary Neural Architecture Search, and Novelty Search in Deep Reinforcement Learning
Evolutionary algorithms have recently re-emerged as powerful tools for machine learning and artificial intelligence, especially when combined with advances in deep learning developed over the last decade. In contrast to the use of fixed architectures and rigid learning algorithms, we leveraged the open-endedness of evolutionary algorithms to make both theoretical and methodological contributions to deep reinforcement learning. This thesis explores and develops two major areas at the intersection of evolutionary algorithms and deep reinforcement learning: generative network architectures and behaviour-based optimization. Over three distinct contributions, both theoretical and experimental methods were applied to deliver a novel mathematical framework and experimental method for generative, modular neural network architecture search for reinforcement learning, and a generalized formulation of a behaviour- based optimization framework for reinforcement learning called novelty search. Experimental results indicate that both alternative, behaviour-based optimization and neural architecture search can each be used to improve learning in the popular Atari 2600 benchmark compared to DQN — a popular gradient-based method. These results are in-line with related work demonstrating that strictly gradient-free methods are competitive with gradient-based reinforcement learning. These contributions, together with other successful combinations of evolutionary algorithms and deep learning, demonstrate that alternative architectures and learning algorithms to those conventionally used in deep learning should be seriously investigated in an effort to drive progress in artificial intelligence
Evolutionary Reinforcement Learning: A Survey
Reinforcement learning (RL) is a machine learning approach that trains agents
to maximize cumulative rewards through interactions with environments. The
integration of RL with deep learning has recently resulted in impressive
achievements in a wide range of challenging tasks, including board games,
arcade games, and robot control. Despite these successes, there remain several
crucial challenges, including brittle convergence properties caused by
sensitive hyperparameters, difficulties in temporal credit assignment with long
time horizons and sparse rewards, a lack of diverse exploration, especially in
continuous search space scenarios, difficulties in credit assignment in
multi-agent reinforcement learning, and conflicting objectives for rewards.
Evolutionary computation (EC), which maintains a population of learning agents,
has demonstrated promising performance in addressing these limitations. This
article presents a comprehensive survey of state-of-the-art methods for
integrating EC into RL, referred to as evolutionary reinforcement learning
(EvoRL). We categorize EvoRL methods according to key research fields in RL,
including hyperparameter optimization, policy search, exploration, reward
shaping, meta-RL, and multi-objective RL. We then discuss future research
directions in terms of efficient methods, benchmarks, and scalable platforms.
This survey serves as a resource for researchers and practitioners interested
in the field of EvoRL, highlighting the important challenges and opportunities
for future research. With the help of this survey, researchers and
practitioners can develop more efficient methods and tailored benchmarks for
EvoRL, further advancing this promising cross-disciplinary research field
Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance
Reinforcement learning (RL) problems often feature deceptive local optima,
and learning methods that optimize purely for reward signal often fail to learn
strategies for overcoming them. Deep neuroevolution and novelty search have
been proposed as effective alternatives to gradient-based methods for learning
RL policies directly from pixels. In this paper, we introduce and evaluate the
use of novelty search over agent action sequences by string edit metric
distance as a means for promoting innovation. We also introduce a method for
stagnation detection and population resampling inspired by recent developments
in the RL community that uses the same mechanisms as novelty search to promote
and develop innovative policies. Our methods extend a state-of-the-art method
for deep neuroevolution using a simple-yet-effective genetic algorithm (GA)
designed to efficiently learn deep RL policy network weights. Experiments using
four games from the Atari 2600 benchmark were conducted. Results provide
further evidence that GAs are competitive with gradient-based algorithms for
deep RL. Results also demonstrate that novelty search over action sequences is
an effective source of selection pressure that can be integrated into existing
evolutionary algorithms for deep RL.Comment: Submitted to GECCO 201
Neuroevolutional Methods for Decision Support Under Uncertainty
The article presents a comparative analysis of the fundamental neuroevolutional methods, which are widely applied for the intellectualization of the decision making support systems under uncertainty. Based on this analysis the new neuroevolutionary method is introduced. It is intended to modify both the topology and the parameters of the neural network, and not to impose additional constraints on the individual. The results of the experimental evaluation of the performance of the methods based on the series of benchmark tasks of adaptive control, classification and restoration of damaged data are carried out. As criteria of the methods evaluation the number of failures and the total number of evolution epochs are used
Recommended from our members
Real-time robotic tasks for cyber-physical avatars
Although modern robots can perform complex tasks using sophisticated algorithms that are specialized to a particular task and environment, creating robots capable of completing tasks in unstructured environments without human guidance (e.g., through teleoperation) remains a challenge. In this research, we present a framework to meet this challenge for a "cyberphysical avatar," which is defined to be a semi-autonomous robotic system that adjusts to an unstructured environment and performs physical tasks subject to critical timing constraints while under human supervision. This thesis first realizes a cyberphysical avatar that integrates three key technologies: (1) whole body-compliant control, (2) skill acquisition from machine learning (neuroevolution methods and deep learning), and (3) vision-based control through visual servoing. Body-compliant control is essential for operator safety because avatars perform cooperative tasks in close proximity to humans; machine learning enables "programming" avatars such that they can be used by non-experts for a large array of tasks, some unforeseen, in an unstructured environment; the visual servoing technique is indispensable for facilitating feedback control in human avatar interaction. This thesis proposes and demonstrates a systematically incremental approach to automating robotic tasks by decomposing a non-trivial task into stages, each of which may be automated by integrating the aforementioned techniques. We design and implement the controllers for two semi-autonomous robots that integrate three key techniques for grasping and pick-and-place tasks. While a general theory is beyond reach, we present a study on the tradeoffs between three design metrics for robotic task systems: (1) the amount of training effort for the robots to perform the task, (2) the time available to complete the task when the command is given, and (3) the quality of the result of the performed task. The tradeoff study in this design space uses the imprecise computation model as a framework to evaluate specific types of tasks: (1) grasping an unknown object and (2) placing the object in a target position. We demonstrate the generality of our integration methodology by applying it to two different robots, Dreamer and Hoppy. Our approach is evaluated by the performance of the robots in trading off between task completion time, training time and task completion success rate, in an environment similar to those in the recent Amazon Picking Challenge.Computer Science
Bio-inspired Dynamic Control Systems with Time Delays
The world around us exhibits a rich and ever changing environment of startling, bewildering and fascinating complexity. Almost everything is never as simple as it seems, but through the chaos we may catch fleeting glimpses of the mechanisms within. Throughout the history of human endeavour we have mimicked nature to harness it for our own ends. Our attempts to develop truly autonomous and intelligent machines have however struggled with the limitations of our human ability. This has encouraged some to shirk this responsibility and instead model biological processes and systems to do it for us.
This Thesis explores the introduction of continuous time delays into biologically inspired dynamic control systems. We seek to exploit rich temporal dynamics found in physical and biological systems for modelling complex or adaptive behaviour through the artificial evolution of networks to control robots. Throughout, arguments have been presented for the modelling of delays not only to better represent key facets of physical and biological systems, but to increase the computational potential of such systems for the synthesis of control.
The thorough investigation of the dynamics of small delayed networks with a wide range of time delays has been undertaken, with a detailed mathematical description of the fixed points of the system and possible oscillatory modes developed to fully describe the behaviour of a single node. Exploration of the behaviour for even small delayed networks illustrates the range of complex behaviour possible and guides the development of interesting solutions.
To further exploit the potential of the rich dynamics in such systems, a novel approach to the 3D simulation of locomotory robots has been developed focussing on minimising the computational cost. To verify this simulation tool a simple quadruped robot was developed and the motion of the robot when undergoing a manually designed gait evaluated. The results displayed a high degree of agreement between the simulation and laser tracker data, verifying the accuracy of the model developed.
A new model of a dynamic system which includes continuous time delays has been introduced, and its utility demonstrated in the evolution of networks for the solution of simple learning behaviours. A range of methods has been developed for determining the time delays, including the novel concept of representing the time delays as related to the distance between nodes in a spatial representation of the network. The application of these tools to a range of examples has been explored, from Gene Regulatory Networks (GRNs) to robot control and neural networks. The performance of these systems has been compared and contrasted with the efficacy of evolutionary runs for the same task over the whole range of network and delay types.
It has been shown that delayed dynamic neural systems are at least as capable as traditional Continuous Time Recurrent Neural Networks (CTRNNs) and show significant performance improvements in the control of robot gaits. Experiments in adaptive behaviour, where there is not such a direct link between the enhanced system dynamics and performance, showed no such discernible improvement. Whilst we hypothesise that the ability of such delayed networks to generate switched pattern generating nodes may be useful in Evolutionary Robotics (ER) this was not borne out here.
The spatial representation of delays was shown to be more efficient for larger networks, however these techniques restricted the search to lower complexity solutions or led to a significant falloff as the network structure becomes more complex. This would suggest that for anything other than a simple genotype, the direct method for encoding delays is likely most appropriate. With proven benefits for robot locomotion and the open potential for adaptive behaviour delayed dynamic systems for evolved control remain an interesting and promising field in complex systems research