421 research outputs found

    Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation

    Get PDF
    Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot

    Modeling Information Exchange Opportunities for Effective Human-Computer Teamwork

    Get PDF
    This paper studies information exchange in collaborative group activities involving mixed networks of people and computer agents. It introduces the concept of "nearly decomposable" decision-making problems to address the complexity of information exchange decisions in such multi-agent settings. This class of decision-making problems arise in settings which have an action structure that requires agents to reason about only a subset of their partners' actions – but otherwise allows them to act independently. The paper presents a formal model of nearly decomposable decision-making problems, NED-MDPs, and defines an approximation algorithm, NED-DECOP that computes efficient information exchange strategies. The paper shows that NED-DECOP is more efficient than prior collaborative planning algorithms for this class of problem. It presents an empirical study of the information exchange decisions made by the algorithm that investigates the extent to which people accept interruption requests from a computer agent. The context for the study is a game in which the agent can ask people for information that may benefit its individual performance and thus the groupʼs collaboration. This study revealed the key factors affecting peopleʼs perception of the benefit of interruptions in this setting. The paper also describes the use of machine learning to predict the situations in which people deviate from the strategies generated by the algorithm, using a combination of domain features and features informed by the algorithm. The methodology followed in this work could form the basis for designing agents that effectively exchange information in collaborations with people.Engineering and Applied Science

    Agent-Driven Representations, Algorithms, and Metrics for Automated Organizational Design.

    Full text link
    As cooperative multiagent systems (MASs) increase in interconnectivity, complexity, size, and longevity, coordinating the agents' reasoning and behaviors becomes increasingly difficult. One approach to address these issues is to use insights from human organizations to design structures within which the agents can more efficiently reason and interact. Generally speaking, an organization influences each agent such that, by following its respective influences, an agent can make globally-useful local decisions without having to explicitly reason about the complete joint coordination problem. For example, an organizational influence might constrain and/or inform which actions an agent performs. If these influences are well-constructed to be cohesive and correlated across the agents, then each agent is influenced into reasoning about and performing only the actions that are appropriate for its (organizationally-designated) portion of the joint coordination problem. In this dissertation, I develop an agent-driven approach to organizations, wherein the foundation for representing and reasoning about an organization stems from the needs of the agents in the MAS. I create an organizational specification language to express the possible ways in which an organization could influence the agents' decision making processes, and leverage details from those decision processes to establish quantitative, principled metrics for organizational performance based on the expected impact that an organization will have on the agents' reasoning and behaviors. Building upon my agent-driven organizational representations, I identify a strategy for automating the organizational design process~(ODP), wherein my ODP computes a quantitative description of organizational patterns and then searches through those possible patterns to identify an (approximately) optimal set of organizational influences for the MAS. Evaluating my ODP reveals that it can create organizations that both influence the MAS into effective patterns of joint policies and also streamline the agents' decision making in a coordinate manner. Finally, I use my agent-driven approach to identify characteristics of effective abstractions over organizational influences and a heuristic strategy for converging on a good abstraction.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113616/1/jsleight_1.pd

    How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

    Full text link
    Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho

    Towards Continual Reinforcement Learning: A Review and Perspectives

    Full text link
    In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure

    Self-Organized Polynomial-Time Coordination Graphs

    Full text link
    Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It conducts a graph-based value factorization and induces explicit coordination among agents to complete complicated tasks. However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. SOP-CG employs dynamic graph topology to ensure sufficient value function expressiveness. The graph selection is unified into an end-to-end learning paradigm. In experiments, we show that our approach learns succinct and well-adapted graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks
    • …
    corecore