464 research outputs found

    Incentive Contracts in Multi-agent Systems: Theory and Applications

    Full text link
    This thesis studies incentive contracts in multi-agent systems with applications to transportation policy. The early adoption of emerging transportation systems such as electric vehicles (EVs), peer-to-peer ridesharing, and automated vehicles (AVs) relies on governmental incentives. Those incentives help achieve a specific market share target, prevent irregular behaviors, and enhance social benefit. Yet, two challenges may impede the implementation of such incentive policies. First, the government and subsidized organizations must confront the uncertainty in a market; Second, the government has no access to the organizations' private information, and thus their strategies are unknown to it. In the face of these challenges, a command-and-control incentive policy fails. In Chapter 2, we revisit the primary setting in which a government agency incentivizes the OEM for accelerating the widespread adoption of AVs. This work aspires to offset the negative externalities of AVs in the ``dark-age'' of AV deployment. More specifically, this chapter designs AV subsidies to shorten the early AV market penetration period and maximize the total expected efficiency benefits of AVs. It seeks a generic optimal AV subsidy structure, so-called ``two-threshold'' subsidy policy, which is proven to be more efficient than the social-welfare maximization approach. In Chapter 3, we develop a multi-agent incentive contracts model to address the issue of stimulating a group of non-cooperating agents to act in the principal's interest over a planning horizon. We extend the single-agent incentive contract to a multi-agent setting with history-dependent terminal conditions. Our contributions include: (a) Finding sufficient conditions for the existence of optimal multi-agent incentive contracts and conditions under which they form a unique Nash Equilibrium; (b) Showing that the optimal multi-agent incentive contracts can be solved by a Hamilton-Jacobi-Bellman equation with equilibrium constraints; (c) Proposing a backward iterative algorithm to solve the problem. In Chapter 4, we obtain the optimal EV and charging infrastructure subsidies through the multi-agent incentive contracts model. Widespread adoption of Electric Vehicles (EV) mostly depends on governmental subsidies during the early stage of deployment. The governmental incentives must strike a balance between an EV manufacturer and a charging infrastructure installer. Yet, the current supply of charging infrastructure is not nearly enough to support EV growth over the next decades. We model the joint subsidy problem as a two-agent incentive contract. The government observes two correlated processes -- the EV market penetration and the charging infrastructure expansion. It looks for an optimal policy that maximizes the cumulative social benefit in the face of uncertainty. In our case study, we find that the optimal dynamic subsidies can achieve 70% of the target EV market share in China by 2025, and also maintains the ratio of charging stations per EV. Chapter 5 ends the thesis with conclusions and promising future research directions. In summary, this thesis provides a new approach to appraise transportation and energy policies against exogenous and endogenous risks.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155199/1/luoqi_1.pd

    A Framework for the Game-theoretic Analysis of Censorship Resistance

    Get PDF
    We present a game-theoretic analysis of optimal solutions for interactions between censors and censorship resistance systems (CRSs) by focusing on the data channel used by the CRS to smuggle clients’ data past the censors. This analysis leverages the inherent errors (false positives and negatives) made by the censor when trying to classify traffic as either non-circumvention traffic or as CRS traffic, as well as the underlying rate of CRS traffic. We identify Nash equilibrium solutions for several simple censorship scenarios and then extend those findings to more complex scenarios where we find that the deployment of a censorship apparatus does not qualitatively change the equilibrium solutions, but rather only affects the amount of traffic a CRS can support before being blocked. By leveraging these findings, we describe a general framework for exploring and identifying optimal strategies for the censorship circumventor, in order to maximize the amount of CRS traffic not blocked by the censor. We use this framework to analyze several scenarios with multiple data-channel protocols used as cover for the CRS. We show that it is possible to gain insights through this framework even without perfect knowledge of the censor’s (secret) values for the parameters in their utility function

    Game theoretical characterization of the multi-agent network expansion game

    Full text link
    Dans les chaînes d’approvisionnement, les producteurs font souvent appel à des entreprises de transport pour livrer leurs marchandises. Cela peut entraîner une concurrence entre les transporteurs qui cherchent à maximiser leurs revenus individuels en desservant un produc- teur. Dans ce travail, nous considérons de telles situations où aucun transporteur ne peut garantir la livraison de la source à la destination en raison de son activité dans une région restreinte (par exemple, une province) ou de la flotte de transport disponible (par exemple, uniquement le transport aérien), pour ne citer que quelques exemples. La concurrence est donc liée à l’expansion de la capacité de transport des transporteurs. Le problème décrit ci-dessus motive l’étude du jeu d’expansion de réseau multi-agent joué sur un réseau appartenant à de multiples transporteurs qui choisissent la capacité de leurs arcs. Simultanément, un client cherche à maximiser le flux qui passe par le réseau en décidant de la politique de partage qui récompense chacun des transporteurs. Le but est de déterminer un équilibre de Nash pour le jeu, en d’autres termes, la strategie d’extension de capacité et de partage la plus rationnelle pour les transporteurs et le client, respectivement. Nous rappelons la formulation basée sur les arcs proposée dans la littérature, dont la solution est l’équilibre de Nash avec le plus grand flux, et nous identifions ses limites. Ensuite, nous formalisons le concept de chemin profitable croissant et nous montrons son utilisation pour établir les conditions nécessaires et suffisantes pour qu’un vecteur de stratégies soit un équilibre de Nash. Ceci nous conduit à la nouvelle formulation basée sur le chemin. Enfin, nous proposons un renforcement du modèle basé sur les arcs et une formulation hybride arc- chemin. Nos résultats expérimentaux soutiennent la valeur des nouvelles inégalités valides obtenues à partir de notre caractérisation des équilibres de Nash avec des chemins croissants rentables. Nous concluons ce travail avec les futures directions de recherche pavées par les contributions de cette thèse.In supply chains, manufacturers often use transportation companies to deliver their goods. This can lead to competition among carriers seeking to maximize their individual revenues by serving a manufacturer. In this work, we consider such situations where no single carrier can guarantee delivery from source to destination due to its operation in a restricted region (e.g., a province) or the available transportation fleet (e.g., only air transportation), to name a few examples. Therefore, competition is linked to the expansion of transportation capacity by carriers. The problem described above motivates the study of the multi-agent network expansion game played over a network owned by multiple transporters who choose their arcs’ capacity. Simultaneously, a customer seeks to maximize the flow that goes through the network by deciding the sharing policy rewarding each of the transporters. The goal is to determine a Nash equilibrium for the game, in simple words, the most rational capacity expansion and sharing policy for the transporters and the customer, respectively. We recap the arc-based formulation proposed in literature, whose solution is the Nash equilibirum with the largest flow, and we identify its limitations. Then, we formalize the concept of profitable increasing path and we show its use to establish necessary and sufficient conditions for a vector of strategies to be a Nash equilibrium. This lead us to the first path-based formulation. Finally, we propose a strengthening for the arc-based model and a hybrid arc-path formulation. Our experimental results support the value of the new valid inequalities obtained from our characterization of Nash equilibria with profitable increasing paths. We conclude this work with the future research directions paved by the contributions of this thesis

    Optimal and Approximate Q-value Functions for Decentralized POMDPs

    Get PDF
    Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N≫2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Auctions and Electronic Markets

    Get PDF

    Coordinating decentralized learning and conflict resolution across agent boundaries

    Get PDF
    It is crucial for embedded systems to adapt to the dynamics of open environments. This adaptation process becomes especially challenging in the context of multiagent systems because of scalability, partial information accessibility and complex interaction of agents. It is a challenge for agents to learn good policies, when they need to plan and coordinate in uncertain, dynamic environments, especially when they have large state spaces. It is also critical for agents operating in a multiagent system (MAS) to resolve conflicts among the learned policies of different agents, since such conflicts may have detrimental influence on the overall performance. The focus of this research is to use a reinforcement learning based local optimization algorithm within each agent to learn multiagent policies in a decentralized fashion. These policies will allow each agent to adapt to changes in environmental conditions while reorganizing the underlying multiagent network when needed. The research takes an adaptive approach to resolving conflicts that can arise between locally optimal agent policies. First an algorithm that uses heuristic rules to locally resolve simple conflicts is presented. When the environment is more dynamic and uncertain, a mediator-based mechanism to resolve more complicated conflicts and selectively expand the agents' state space during the learning process is harnessed. For scenarios where mediator-based mechanisms with partially global views are ineffective, a more rigorous approach for global conflict resolution that synthesizes multiagent reinforcement learning (MARL) and distributed constraint optimization (DCOP) is developed. These mechanisms are evaluated in the context of a multiagent tornado tracking application called NetRads. Empirical results show that these mechanisms significantly improve the performance of the tornado tracking network for a variety of weather scenarios. The major contributions of this work are: a state of the art decentralized learning approach that supports agent interactions and reorganizes the underlying network when needed; the use of abstract classes of scenarios/states/actions that efficiently manages the exploration of the search space; novel conflict resolution algorithms of increasing complexity that use heuristic rules, sophisticated automated negotiation mechanisms and distributed constraint optimization methods respectively; and finally, a rigorous study of the interplay between two popular theories used to solve multiagent problems, namely decentralized Markov decision processes and distributed constraint optimization

    Non-Cooperative Games for Self-Interested Planning Agents

    Full text link
    Multi-Agent Planning (MAP) is a topic of growing interest that deals with the problem of automated planning in domains where multiple agents plan and act together in a shared environment. In most cases, agents in MAP are cooperative (altruistic) and work together towards a collaborative solution. However, when rational self-interested agents are involved in a MAP task, the ultimate objective is to find a joint plan that accomplishes the agents' local tasks while satisfying their private interests. Among the MAP scenarios that involve self-interested agents, non-cooperative MAP refers to problems where non-strictly competitive agents feature common and conflicting interests. In this setting, conflicts arise when self-interested agents put their plans together and the resulting combination renders some of the plans non-executable, which implies a utility loss for the affected agents. Each participant wishes to execute its plan as it was conceived, but congestion issues and conflicts among the actions of the different plans compel agents to find a coordinated stable solution. Non-cooperative MAP tasks are tackled through non-cooperative games, which aim at finding a stable (equilibrium) joint plan that ensures the agents' plans are executable (by addressing planning conflicts) while accounting for their private interests as much as possible. Although this paradigm reflects many real-life problems, there is a lack of computational approaches to non-cooperative MAP in the literature. This PhD thesis pursues the application of non-cooperative games to solve non-cooperative MAP tasks that feature rational self-interested agents. Each agent calculates a plan that attains its individual planning task, and subsequently, the participants try to execute their plans in a shared environment. We tackle non-cooperative MAP from a twofold perspective. On the one hand, we focus on agents' satisfaction by studying desirable properties of stable solutions, such as optimality and fairness. On the other hand, we look for a combination of MAP and game-theoretic techniques capable of efficiently computing stable joint plans while minimizing the computational complexity of this combined task. Additionally, we consider planning conflicts and congestion issues in the agents' utility functions, which results in a more realistic approach. To the best of our knowledge, this PhD thesis opens up a new research line in non-cooperative MAP and establishes the basic principles to attain the problem of synthesizing stable joint plans for self-interested planning agents through the combination of game theory and automated planning.La Planificación Multi-Agente (PMA) es un tema de creciente interés que trata el problema de la planificación automática en dominios donde múltiples agentes planifican y actúan en un entorno compartido. En la mayoría de casos, los agentes en PMA son cooperativos (altruistas) y trabajan juntos para obtener una solución colaborativa. Sin embargo, cuando los agentes involucrados en una tarea de PMA son racionales y auto-interesados, el objetivo último es obtener un plan conjunto que resuelva las tareas locales de los agentes y satisfaga sus intereses privados. De entre los distintos escenarios de PMA que involucran agentes auto-interesados, la PMA no cooperativa se centra en problemas que presentan un conjunto de agentes no estrictamente competitivos con intereses comunes y conflictivos. En este contexto, pueden surgir conflictos cuando los agentes ponen en común sus planes y la combinación resultante provoca que algunos de estos planes no sean ejecutables, lo que implica una pérdida de utilidad para los agentes afectados. Cada participante desea ejecutar su plan tal como fue concebido, pero las congestiones y conflictos que pueden surgir entre las acciones de los diferentes planes fuerzan a los agentes a obtener una solución estable y coordinada. Las tareas de PMA no cooperativa se abordan a través de juegos no cooperativos, cuyo objetivo es hallar un plan conjunto estable (equilibrio) que asegure que los planes de los agentes sean ejecutables (resolviendo los conflictos de planificación) al tiempo que los agentes satisfacen sus intereses privados en la medida de lo posible. Aunque este paradigma refleja muchos problemas de la vida real, existen pocos enfoques computacionales para PMA no cooperativa en la literatura. Esta tesis doctoral estudia el uso de juegos no cooperativos para resolver tareas de PMA no cooperativa con agentes racionales auto-interesados. Cada agente calcula un plan para su tarea de planificación y posteriormente, los participantes intentan ejecutar sus planes en un entorno compartido. Abordamos la PMA no cooperativa desde una doble perspectiva. Por una parte, nos centramos en la satisfacción de los agentes estudiando las propiedades deseables de soluciones estables, tales como la optimalidad y la justicia. Por otra parte, buscamos una combinación de PMA y técnicas de teoría de juegos capaz de calcular planes conjuntos estables de forma eficiente al tiempo que se minimiza la complejidad computacional de esta tarea combinada. Además, consideramos los conflictos de planificación y congestiones en las funciones de utilidad de los agentes, lo que resulta en un enfoque más realista. Bajo nuestro punto de vista, esta tesis doctoral abre una nueva línea de investigación en PMA no cooperativa y establece los principios básicos para resolver el problema de la generación de planes conjuntos estables para agentes de planificación auto-interesados mediante la combinación de teoría de juegos y planificación automática.La Planificació Multi-Agent (PMA) és un tema de creixent interès que tracta el problema de la planificació automàtica en dominis on múltiples agents planifiquen i actuen en un entorn compartit. En la majoria de casos, els agents en PMA són cooperatius (altruistes) i treballen junts per obtenir una solució col·laborativa. No obstant això, quan els agents involucrats en una tasca de PMA són racionals i auto-interessats, l'objectiu últim és obtenir un pla conjunt que resolgui les tasques locals dels agents i satisfaci els seus interessos privats. D'entre els diferents escenaris de PMA que involucren agents auto-interessats, la PMA no cooperativa se centra en problemes que presenten un conjunt d'agents no estrictament competitius amb interessos comuns i conflictius. En aquest context, poden sorgir conflictes quan els agents posen en comú els seus plans i la combinació resultant provoca que alguns d'aquests plans no siguin executables, el que implica una pèrdua d'utilitat per als agents afectats. Cada participant vol executar el seu pla tal com va ser concebut, però les congestions i conflictes que poden sorgir entre les accions dels diferents plans forcen els agents a obtenir una solució estable i coordinada. Les tasques de PMA no cooperativa s'aborden a través de jocs no cooperatius, en els quals l'objectiu és trobar un pla conjunt estable (equilibri) que asseguri que els plans dels agents siguin executables (resolent els conflictes de planificació) alhora que els agents satisfan els seus interessos privats en la mesura del possible. Encara que aquest paradigma reflecteix molts problemes de la vida real, hi ha pocs enfocaments computacionals per PMA no cooperativa en la literatura. Aquesta tesi doctoral estudia l'ús de jocs no cooperatius per resoldre tasques de PMA no cooperativa amb agents racionals auto-interessats. Cada agent calcula un pla per a la seva tasca de planificació i posteriorment, els participants intenten executar els seus plans en un entorn compartit. Abordem la PMA no cooperativa des d'una doble perspectiva. D'una banda, ens centrem en la satisfacció dels agents estudiant les propietats desitjables de solucions estables, com ara la optimalitat i la justícia. D'altra banda, busquem una combinació de PMA i tècniques de teoria de jocs capaç de calcular plans conjunts estables de forma eficient alhora que es minimitza la complexitat computacional d'aquesta tasca combinada. A més, considerem els conflictes de planificació i congestions en les funcions d'utilitat dels agents, el que resulta en un enfocament més realista. Des del nostre punt de vista, aquesta tesi doctoral obre una nova línia d'investigació en PMA no cooperativa i estableix els principis bàsics per resoldre el problema de la generació de plans conjunts estables per a agents de planificació auto-interessats mitjançant la combinació de teoria de jocs i planificació automàtica.Jordán Prunera, JM. (2017). Non-Cooperative Games for Self-Interested Planning Agents [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90417TESI
    • …
    corecore