26 research outputs found

    Stackelberg Game-Theoretic Trajectory Guidance for Multi-Robot Systems with Koopman Operator

    Full text link
    Guided trajectory planning involves a leader robotic agent strategically directing a follower robotic agent to collaboratively reach a designated destination. However, this task becomes notably challenging when the leader lacks complete knowledge of the follower's decision-making model. There is a need for learning-based methods to effectively design the cooperative plan. To this end, we develop a Stackelberg game-theoretic approach based on Koopman operator to address the challenge. We first formulate the guided trajectory planning problem through the lens of a dynamic Stackelberg game. We then leverage Koopman operator theory to acquire a learning-based linear system model that approximates the follower's feedback dynamics. Based on this learned model, the leader devises a collision-free trajectory to guide the follower, employing receding horizon planning. We use simulations to elaborate the effectiveness of our approach in generating learning models that accurately predict the follower's multi-step behavior when compared to alternative learning techniques. Moreover, our approach successfully accomplishes the guidance task and notably reduces the leader's planning time to nearly half when contrasted with the model-based baseline method

    Multiagent reinforcement learning in Markov games : asymmetric and symmetric approaches

    Get PDF
    Modern computing systems are distributed, large, and heterogeneous. Computers, other information processing devices and humans are very tightly connected with each other and therefore it would be preferable to handle these entities more as agents than stand-alone systems. One of the goals of artificial intelligence is to understand interactions between entities, whether they are artificial or natural, and to suggest how to make good decisions while taking other decision makers into account. In this thesis, these interactions between intelligent and rational agents are modeled with Markov games and the emphasis is on adaptation and learning in multiagent systems. Markov games are a general mathematical tool for modeling interactions between multiple agents. The model is very general, for example common board games are special instances of Markov games, and particularly interesting because it forms an intersection of two distinct research disciplines: machine learning and game theory. Markov games extend Markov decision processes, a well-known tool for modeling single-agent problems, to multiagent domains. On the other hand, Markov games can be seen as a dynamic extension to strategic form games, which are standard models in traditional game theory. From the computer science perspective, Markov games provide a flexible and efficient way to describe different social interactions between intelligent agents. This thesis studies different aspects of learning in Markov games. From the machine learning perspective, the focus is on a very general learning model, i.e. reinforcement learning, in which the goal is to maximize the long-time performance of the learning agent. The thesis introduces an asymmetric learning model that is computationally efficient in multiagent systems and enables the construction of different agent hierarchies. In multiagent reinforcement learning systems based on Markov games, the space and computational requirements grow very quickly with the number of learning agents and the size of the problem instance. Therefore, it is necessary to use function approximators, such as neural networks, to model agents in many real-world applications. In this thesis, various numeric learning methods are proposed for multiagent learning problems. The proposed methods are tested with small but non-trivial example problems from different research areas including artificial robot navigation, simplified soccer game, and automated pricing models for intelligent agents. The thesis also contains an extensive literature survey on multiagent reinforcement learning and various methods based on Markov games. Additionally, game-theoretic methods and methods originated from computer science for multiagent learning and decision making are compared.reviewe

    A committee neural network for prediction of normalized oil content from well log data: An example from South Pars Gas Field, Persian Gulf

    Get PDF
    Normalized oil content (NOC) is an important geochemical factor for identifyingpotential pay zones in hydrocarbon source rocks. The present study proposes an optimaland improved model to make a quantitative and qualitative correlation between NOC andwell log responses by integration of neural network training algorithms and thecommittee machine concept. This committee machine with training algorithms (CMTA)combines Levenberg-Marquardt (LM), Bayesian regularization (BR), gradient descent(GD), one step secant (OSS), and resilient back-propagation (RP) algorithms. Each ofthese algorithms has a weight factor showing its contribution in overall prediction. Theoptimal combination of the weights is derived by a genetic algorithm. The method isillustrated using a case study. For this purpose, 231 data composed of well log data andmeasured NOC from three wells of South Pars Gas Field were clustered into 194modeling dataset and 37 testing samples for evaluating reliability of the models. Theresults of this study show that the CMTA provides more reliable and acceptable resultsthan each of the individual neural networks differing in training algorithms. Also CMTAcan accurately identify production pay zones (PPZs) from well logs

    Bi-level Actor-Critic for Multi-agent Coordination

    Full text link
    Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents \emph{unequally} and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find an asymmetric solution in a highway merge environment

    Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments

    Get PDF
    This thesis explores the application of multi-agent reinforcement learning in domains containing asymmetries between agents, caused by differences in information and position, resulting in a hierarchy of leaders and followers. Leaders are agents that have access to follower agent policies and the ability to commit to an action before the followers. These followers can observe actions taken by leaders and respond to maximize their own payoffs. Since leaders know the follower policies, they can manipulate the followers to elicit a better payoff for themselves. In this work, we focus on the problem of training agents in a multi-agent setting with continuous actions at different levels of hierarchy to obtain the best payoffs at their given positions. To address this problem we propose a new algorithm, Stackelberg Multi-Agent Reinforcement Learning (SMARL) that incorporates the Stackelberg equilibrium concept into the multi-agent deep deterministic policy gradient (MADDPG) algorithm. This enables us to efficiently train agents at all levels in the hierarchy. Since maximization over a continuous action space is intractable, we propose a method to solve our Stackelberg formulation for continuous actions using conditional actions and gradient descent. We evaluate our algorithm on multiple mixed cooperative and competitive multi-agent domains, consisting of our custom built highway driving environment and a subset of the multi-agent particle environments. We show that agents trained using our proposed algorithm outperform those trained with existing methods in most hierarchical domains, and are comparable in the rest
    corecore