265 research outputs found

    Distributed accelerated Nash equilibrium learning for two-subnetwork zero-sum game with bilinear coupling

    Get PDF
    summary:This paper proposes a distributed accelerated first-order continuous-time algorithm for O(1/t2)O({1}/{t^2}) convergence to Nash equilibria in a class of two-subnetwork zero-sum games with bilinear couplings. First-order methods, which only use subgradients of functions, are frequently used in distributed/parallel algorithms for solving large-scale and big-data problems due to their simple structures. However, in the worst cases, first-order methods for two-subnetwork zero-sum games often have an asymptotic or O(1/t)O(1/t) convergence. In contrast to existing time-invariant first-order methods, this paper designs a distributed accelerated algorithm by combining saddle-point dynamics and time-varying derivative feedback techniques. If the parameters of the proposed algorithm are suitable, the algorithm owns O(1/t2)O(1/t^2) convergence in terms of the duality gap function without any uniform or strong convexity requirement. Numerical simulations show the efficacy of the algorithm

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N≫2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Searching for joint gains in automated negotiations based on multi-criteria decision making theory

    Get PDF
    It is well established by conflict theorists and others that successful negotiation should incorporate "creating value" as well as "claiming value." Joint improvements that bring benefits to all parties can be realised by (i) identifying attributes that are not of direct conflict between the parties, (ii) tradeoffs on attributes that are valued differently by different parties, and (iii) searching for values within attributes that could bring more gains to one party while not incurring too much loss on the other party. In this paper we propose an approach for maximising joint gains in automated negotiations by formulating the negotiation problem as a multi-criteria decision making problem and taking advantage of several optimisation techniques introduced by operations researchers and conflict theorists. We use a mediator to protect the negotiating parties from unnecessary disclosure of information to their opponent, while also allowing an objective calculation of maximum joint gains. We separate out attributes that take a finite set of values (simple attributes) from those with continuous values, and we show that for simple attributes, the mediator can determine the Pareto-optimal values. In addition we show that if none of the simple attributes strongly dominates the other simple attributes, then truth telling is an equilibrium strategy for negotiators during the optimisation of simple attributes. We also describe an approach for improving joint gains on non-simple attributes, by moving the parties in a series of steps, towards the Pareto-optimal frontier

    Spatial competition of learning agents in agricultural procurement markets

    Get PDF
    Spatially dispersed farmers supply raw milk as the primary input to a small number of large dairy-processing firms. The spatial competition of processing firms has short- to long-term repercussions on farm and processor structure, as it determines the regional demand for raw milk and the resulting raw milk price. A number of recent analytical and empirical contributions in the literature analyse the spatial price competition of processing firms in milk markets. Agent-based models (ABMs) serve by now as computational laboratories in many social science and interdisciplinary fields and are recently also introduced as bottom-up approaches to help understand market outcomes emerging from autonomously deciding and interacting agents. Despite ABMs' strengths, the inclusion of interactive learning by intelligent agents is not sufficiently matured. Although the literature of multi-agent systems (MASs) and multi-agent economic simulation are related fields of research they have progressed along separate paths. This thesis takes us through some basic steps involved in developing a theoretical basis for designing multi-agent learning in spatial economic ABMs. Each of the three main chapters of the thesis investigates a core issue for designing interactive learning systems with the overarching aim of better understanding the emergence of pricing behaviour in real, spatial agricultural markets. An important problem in the competitive spatial economics literature is the lack of a rigorous theoretical explanation for observed collusive behavior in oligopsonistic markets. The first main chapter theoretically derives how the incorporation of foresight in agents' pricing policy in spatial markets might move the system towards cooperative Nash equilibria. It is shown that a basic level of foresight invites competing firms to cease limitless price wars. Introducing the concept of an outside option into the agents' decisions within a dynamic pricing game reveals viihow decreasing returns for increasing strategic thinking correlates with the relevance of transportation costs. In the second main chapter, we introduce a new learning algorithm for rational agents using H-PHC (hierarchical policy hill climbing) in spatial markets. While MASs algorithms are typically just applicable to small problems, we show experimentally how a community of multiple rational agents is able to overcome the coordination problem in a variety of spatial (and non-spatial) market games of rich decision spaces with modest computational effort. The theoretical explanation of emerging price equilibria in spatial markets is much disputed in the literature. The majority of papers attribute the pricing behavior of processing firms (mill price and freight absorption) merely to the spatial structure of markets. Based on a computational approach with interactive learning agents in two-dimensional space, the third main chapter suggests that associating the extent of freight absorption just with the factor space can be ambiguous. In addition, the pricing behavior of agricultural processors – namely the ability to coordinate and achieve mutually beneficial outcomes - also depends on their ability to learn from each other
    • …
    corecore