3,052 research outputs found

    Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence

    Full text link
    Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/

    On Partially Controlled Multi-Agent Systems

    Full text link
    Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multi-agent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control. We refer to such systems as partially controlled multi-agent systems, and we investigate how one might influence the behavior of the uncontrolled agents through appropriate design of the controlled agents. In particular, we wish to understand which problems are naturally described in these terms, what methods can be applied to influence the uncontrollable agents, the effectiveness of such methods, and whether similar methods work across different domains. Using a game-theoretic framework, this paper studies the design of partially controlled multi-agent systems in two contexts: in one context, the uncontrollable agents are expected utility maximizers, while in the other they are reinforcement learners. We suggest different techniques for controlling agents' behavior in each domain, assess their success, and examine their relationship.Comment: See http://www.jair.org/ for any accompanying file

    Multi-robot team formation control in the GUARDIANS project

    Get PDF
    Purpose The GUARDIANS multi-robot team is to be deployed in a large warehouse in smoke. The team is to assist firefighters search the warehouse in the event or danger of a fire. The large dimensions of the environment together with development of smoke which drastically reduces visibility, represent major challenges for search and rescue operations. The GUARDIANS robots guide and accompany the firefighters on site whilst indicating possible obstacles and the locations of danger and maintaining communications links. Design/methodology/approach In order to fulfill the aforementioned tasks the robots need to exhibit certain behaviours. Among the basic behaviours are capabilities to stay together as a group, that is, generate a formation and navigate while keeping this formation. The control model used to generate these behaviours is based on the so-called social potential field framework, which we adapt to the specific tasks required for the GUARDIANS scenario. All tasks can be achieved without central control, and some of the behaviours can be performed without explicit communication between the robots. Findings The GUARDIANS environment requires flexible formations of the robot team: the formation has to adapt itself to the circumstances. Thus the application has forced us to redefine the concept of a formation. Using the graph-theoretic terminology, we can say that a formation may be stretched out as a path or be compact as a star or wheel. We have implemented the developed behaviours in simulation environments as well as on real ERA-MOBI robots commonly referred to as Erratics. We discuss advantages and shortcomings of our model, based on the simulations as well as on the implementation with a team of Erratics.</p

    Efficient Model Learning for Human-Robot Collaborative Tasks

    Get PDF
    We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot

    Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning

    Full text link
    Guided cooperation is a common task in many multi-agent teaming applications. The planning of the cooperation is difficult when the leader robot has incomplete information about the follower, and there is a need to learn, customize, and adapt the cooperation plan online. To this end, we develop a learning-based Stackelberg game-theoretic framework to address this challenge to achieve optimal trajectory planning for heterogeneous robots. We first formulate the guided trajectory planning problem as a dynamic Stackelberg game and design the cooperation plans using open-loop Stackelberg equilibria. We leverage meta-learning to deal with the unknown follower in the game and propose a Stackelberg meta-learning framework to create online adaptive trajectory guidance plans, where the leader robot learns a meta-best-response model from a prescribed set of followers offline and then fast adapts to a specific online trajectory guidance task using limited learning data. We use simulations in three different scenarios to elaborate on the effectiveness of our framework. Comparison with other learning approaches and no guidance cases show that our framework provides a more time- and data-efficient planning method in trajectory guidance tasks

    Fictitious play for cooperative action selection in robot teams

    Get PDF
    A game-theoretic distributed decision making approach is presented for the problem of control effort allocation in a robotic team based on a novel variant of fictitious play. The proposed learning process allows the robots to accomplish their objectives by coordinating their actions in order to efficiently complete their tasks. In particular, each robot of the team predicts the other robots' planned actions, while making decisions to maximise their own expected reward that depends on the reward for joint successful completion of the task. Action selection is interpreted as an n-player cooperative game. The approach presented can be seen as part of the Belief Desire Intention (BDI) framework, also can address the problem of cooperative, legal, safe, considerate and emphatic decisions by robots if their individual and group rewards are suitably defined. After theoretical analysis the performance of the proposed algorithm is tested on four simulation scenarios. The first one is a coordination game between two material handling robots, the second one is a warehouse patrolling task by a team of robots, the third one presents a coordination mechanism between two robots that carry a heavy object on a corridor and the fourth one is an example of coordination on a sensors network
    • …
    corecore