410 research outputs found

    Evolutionary Algorithms for Reinforcement Learning

    Full text link
    There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications

    A Coevolutionary View of Information Services Development: Lessons from the U.S. National Oceanic and Atmospheric Administration

    Get PDF
    This study investigates the process of information services development based on a case study of the experience of the U.S. National Oceanic and Atmospheric Administration (NOAA). In this study, we develop theoretical constructs that can inform researchers and practitioners on (1) what the critical domains and interactions associated with the emerging process of information service development at these organizations were, and (2) how information services at NOAA evolved over time? Adopting a coevolutionary view, we identified distinct yet interdependent domains that affected, and were affected by, the information services development process; these were: (1) services choreography, through which service interactions and collaborations are managed; (2) services orchestration, through which service processes are selected and interact; and (3) services instrumentation, by which services are developed and architected. Using the coevolutionary view, we uncovered three adaptive principles that explain the interplay among domains and interactions over time: adaptive tensions, requisite variety, and modular design. We discuss our findingsā€™ implications for research and practice and offer propositions for future research

    Multiagent Learning via Dynamic Skill Selection

    Get PDF
    Multiagent coordination has many real-world applications such as self-driving cars, inventory management, search and rescue, package delivery, traļ¬ƒc management, warehouse management, and transportation. These tasks are generally character-ized by a global team objective that is often temporally sparse - realized only upon completing an episode. The sparsity of the shared team objective often makes it an inadequate learning signal to learn eļ¬€ective strategies. Moreover, this reward signal does not capture the marginal contribution of each agent towards the global objective. This leads to the problem of structural credit assignment in multia-gent systems. Furthermore, due to a lack of accurate understanding of desired task behaviors, it is often challenging to manually design agent-speciļ¬c rewards to improved coordination. While learning these undeļ¬ned local objectives is very critical for a successful coordination, it is extremely challenging due to these two core challenges. Firstly, due to interaction among agents in an environment, the complexity of the problem may rise exponentially with the number of agents, and their behavioral sophisti-cation. An agent perceives the environment as non-stationary, due to all learn-ing concurrently. This leads to an agent perceiving the coordination objective as extremely noisy. Secondly, the goal information required to learn coordination behavior is distributed among agents. This makes it diļ¬ƒcult for agents to learn undeļ¬ned desired behaviors that optimizes a team objective. The key contribution of this work is to address the credit assignment problem in multiagent coordination using several semantically meaningful local rewards. We argue that real-world multiagent coordination tasks can be decomposed into several meaningful skills. Further, we introduce MADyS, a framework that can optimize a global reward by learning to dynamically select the most optimal skill from semantically meaningful skills, characterized by their local rewards, without requiring any form of reward shaping. Here, each local reward describes a basic skill and is designed based on domain knowledge. MADyS combines gradient-based optimization to maximize dense local rewards and gradient-free optimization to maximize the sparse team-based reward. Each local reward is used to train a local policy learner using policy gradient (PG) - and an evolutionary algorithm (EA) that searches in a population of policies to maximize the global objective by picking the most optimal local reward at each time step of an episode. While these two processes occur concurrently, the experiences collected by the EA population are stored in a replay buļ¬€er and utilized by the PG based local rewards optimizer for better sample eļ¬ƒciency. Our experimental results show that MADyS outperforms several baselines. We also visualize the complex coordination behaviors by studying the temporal distri-bution shifts of the selected local rewards. By visualizing these shifts throughout an episode, we gain insight into how agents learn to (i) decompose a complex task into various sub-tasks, (ii) dynamically conļ¬gure sub-teams, and (iii) assign the selected sub-tasks to the sub-teams to optimize as a team on the global objective

    Aesthetic Gadgets : Rethinking Universalism in Evolutionary Aesthetics

    Get PDF
    There is a growing appetite for the inclusion of outcomes of empirical research into philosophical aesthetics. At the same time, evolutionary aesthetics remains in the margins with little mutual discussion with the various strands of philosophical aesthetics. This is surprising, because the evolutionary framework has the power to bring these two approaches together. This article demonstrates that the evolutionary approach builds a biocultural bridge between our philosophical and empirical understanding of humans as aesthetic agents who share the preconditions for aesthetic experience, but are not determined by them. Sometimes, philosophers are wary of the evolutionary framework. Does the research program of evolutionary aesthetics presuppose an intrinsic aesthetic instinct that would determine the way we form aesthetic judgments, regardless of the environment with which we interact? I argue that it does not. Imitation and mindreading are considered to be central features of the aesthetic module. Recently, and contrary to the prior view, it has been shown that imitation and mindreading are not likely to be innate instincts but socially learned, yet evolved patterns of behavior. Hence, I offer grounds for the idea that the cognitive aesthetic module(s) is socially learned, too. This outcome questions the need for the traditional differentiation between empirical and philosophical aesthetics.Peer reviewe

    Aesthetic Gadgets: Rethinking Universalism in Evolutionary Aesthetics

    Get PDF
    There is a growing appetite for the inclusion of outcomes of empirical research into philosophical aesthetics. At the same time, evolutionary aesthetics remains in the margins with little mutual discussion with the various strands of philosophical aesthetics. This is surprising, because the evolutionary framework has the power to bring these two approaches together. This article demonstrates that the evolutionary approach builds a biocultural bridge between our philosophical and empirical understanding of humans as aesthetic agents who share the preconditions for aesthetic experience, but are not determined by them. Sometimes, philosophers are wary of the evolutionary framework. Does the research program of evolutionary aesthetics presuppose an intrinsic aesthetic instinct that would determine the way we form aesthetic judgments, regardless of the environment with which we interact? I argue that it does not. Imitation and mindreading are considered to be central features of the aesthetic module. Recently, and contrary to the prior view, it has been shown that imitation and mindreading are not likely to be innate instincts but socially learned, yet evolved patterns of behavior. Hence, I offer grounds for the idea that the cognitive aesthetic module(s) is socially learned, too. This outcome questions the need for the traditional differentiation between empirical and philosophical aesthetics

    Providing Informative Feedback for Learning in Tightly Coupled Multiagent Domains

    Get PDF
    Autonomous agents that sense, decide, act, and coordinate effectively with each other are critical in many real-world domains such as autonomous driving, search and rescue missions, air traffic management, and underwater or deep space exploration. All such domains share a key difficulty: though high-level mission goals are clear to system designers, the agent behaviors that achieve those goals are not. Thus, system designers aim to use adaptive approaches such as reinforcement learning (RL) or evolutionary algorithms (EA) to discover the ideal behaviors for the agents, and these behaviors are often implemented in computational policies (for example as artificial neural networks) that map sensory inputs to actions or values. But for such learning systems to be successful, they need to leverage a system feedback (based on the agents' collective performance) to revise and update the agents' policies for how the agents should interact with the environment. Unfortunately, both RL and EA approaches struggle when the environmental feedback is sparse and/or uninformative, especially in multiagent domains where teasing out an agentā€™s contribution to the system is difficult. Reward shaping methods address some of this difficulty, but they also suffer when faced with tightly coupled multiagent domains where feedback depends on multiple agents taking the correct joint action at the appropriate time. The contributions of this work is to introduce Reward-Shaped Curriculum Learning, Fitness Critics, and Bidirectional Fitness Critics to address the challenges of sparse feedback in tightly coupled multiagent domains. Reward-Shaped Curriculum Learning trains agents on successively more complex scenarios, which enables agents to use reward shaping to discover the correct actions first and then coordinate for the complex tasks. The impact of this approach is "reduce the sparsity'' of the reward. Fitness Critics directly address the sparse feedback problem by replacing the system reward with a step-by-step performance metric that maps the step-wise observations and actions to meaningful evaluations that are able to identify desirable behaviors. The impact of this approach is to turn a sparse, policy-based reward into a dense, state-action-based reward that trains agents for specific behaviors. Bidirectional Fitness Critics extends Fitness Critics to provide more informative feedback by leveraging the temporal information about the reward and the relevance of that information to the task. The impact of this approach is to more accurately capture the agents' contribution to the desired behavior
    • ā€¦
    corecore