713 research outputs found

    Learning to Coordinate Efficiently: A Model-based Approach

    Full text link
    In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms

    Learning to coordinate in a complex and non-stationary world

    Full text link
    We study analytically and by computer simulations a complex system of adaptive agents with finite memory. Borrowing the framework of the Minority Game and using the replica formalism we show the existence of an equilibrium phase transition as a function of the ratio between the memory λ\lambda and the learning rates Γ\Gamma of the agents. We show that, starting from a random configuration, a dynamic phase transition also exists, which prevents the system from reaching any Nash equilibria. Furthermore, in a non-stationary environment, we show by numerical simulations that agents with infinite memory play worst than others with less memory and that the dynamic transition naturally arises independently from the initial conditions.Comment: 4 pages, 3 figure

    Learning to Coordinate with Anyone

    Full text link
    In open multi-agent environments, the agents may encounter unexpected teammates. Classical multi-agent learning approaches train agents that can only coordinate with seen teammates. Recent studies attempted to generate diverse teammates to enhance the generalizable coordination ability, but were restricted by pre-defined teammates. In this work, our aim is to train agents with strong coordination ability by generating teammates that fully cover the teammate policy space, so that agents can coordinate with any teammates. Since the teammate policy space is too huge to be enumerated, we find only dissimilar teammates that are incompatible with controllable agents, which highly reduces the number of teammates that need to be trained with. However, it is hard to determine the number of such incompatible teammates beforehand. We therefore introduce a continual multi-agent learning process, in which the agent learns to coordinate with different teammates until no more incompatible teammates can be found. The above idea is implemented in the proposed Macop (Multi-agent compatible policy learning) algorithm. We conduct experiments in 8 scenarios from 4 environments that have distinct coordination patterns. Experiments show that Macop generates training teammates with much lower compatibility than previous methods. As a result, in all scenarios Macop achieves the best overall coordination ability while never significantly worse than the baselines, showing strong generalization ability

    Stabilize to Act: Learning to Coordinate for Bimanual Manipulation

    Full text link
    Key to rich, dexterous manipulation in the real world is the ability to coordinate control across two hands. However, while the promise afforded by bimanual robotic systems is immense, constructing control policies for dual arm autonomous systems brings inherent difficulties. One such difficulty is the high-dimensionality of the bimanual action space, which adds complexity to both model-based and data-driven methods. We counteract this challenge by drawing inspiration from humans to propose a novel role assignment framework: a stabilizing arm holds an object in place to simplify the environment while an acting arm executes the task. We instantiate this framework with BimanUal Dexterity from Stabilization (BUDS), which uses a learned restabilizing classifier to alternate between updating a learned stabilization position to keep the environment unchanged, and accomplishing the task with an acting policy learned from demonstrations. We evaluate BUDS on four bimanual tasks of varying complexities on real-world robots, such as zipping jackets and cutting vegetables. Given only 20 demonstrations, BUDS achieves 76.9% task success across our task suite, and generalizes to out-of-distribution objects within a class with a 52.7% success rate. BUDS is 56.0% more successful than an unstructured baseline that instead learns a BC stabilizing policy due to the precision required of these complex tasks. Supplementary material and videos can be found at https://sites.google.com/view/stabilizetoact .Comment: Conference on Robot Learning, 202

    Communicative Bottlenecks Lead to Maximal Information Transfer

    Get PDF
    This paper presents new analytic and numerical analysis of signalling games that give rise to informational bottlenecks—that is to say, signalling games with more state/act pairs than available signals to communicate information about the world. I show via simulation that agents learning to coordinate tend to favour partitions of nature which provide maximal information transfer. This is true despite the fact that nothing from an initial analysis of the stability properties of the underlying signalling game suggests that this should be the case. As a first pass to explain this, I note that the underlying structure of our model favours maximal information transfer in regard to the simple combinatorial properties of how the agents might partition nature into kinds. However, I suggest that this does not perfectly capture the empirical results; thus, several open questions remain

    Measuring collaborative emergent behavior in multi-agent reinforcement learning

    Full text link
    Multi-agent reinforcement learning (RL) has important implications for the future of human-agent teaming. We show that improved performance with multi-agent RL is not a guarantee of the collaborative behavior thought to be important for solving multi-agent tasks. To address this, we present a novel approach for quantitatively assessing collaboration in continuous spatial tasks with multi-agent RL. Such a metric is useful for measuring collaboration between computational agents and may serve as a training signal for collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design, 6 pages, 2 figures, 1 tabl
    • …
    corecore