8 research outputs found

    A learning-based approach to multi-agent decision-making

    Full text link
    We propose a learning-based methodology to reconstruct private information held by a population of interacting agents in order to predict an exact outcome of the underlying multi-agent interaction process, here identified as a stationary action profile. We envision a scenario where an external observer, endowed with a learning procedure, is allowed to make queries and observe the agents' reactions through private action-reaction mappings, whose collective fixed point corresponds to a stationary profile. By adopting a smart query process to iteratively collect sensible data and update parametric estimates, we establish sufficient conditions to assess the asymptotic properties of the proposed learning-based methodology so that, if convergence happens, it can only be towards a stationary action profile. This fact yields two main consequences: i) learning locally-exact surrogates of the action-reaction mappings allows the external observer to succeed in its prediction task, and ii) working with assumptions so general that a stationary profile is not even guaranteed to exist, the established sufficient conditions hence act also as certificates for the existence of such a desirable profile. Extensive numerical simulations involving typical competitive multi-agent control and decision making problems illustrate the practical effectiveness of the proposed learning-based approach

    A Generalized Training Approach for Multiagent Learning

    Get PDF
    This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, α\alpha-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and α\alpha-Rank. We demonstrate the competitive performance of α\alpha-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where α\alpha-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain

    Approximate Analysis of Large Simulation-Based Games.

    Full text link
    Game theory offers powerful tools for reasoning about agent behavior and incentives in multi-agent systems. Traditional approaches to game-theoretic analysis require enumeration of all possible strategies and outcomes. This often constrains game models to small numbers of agents and strategies or simple closed-form payoff descriptions. Simulation-based game theory extends the reach of game-theoretic analysis through the use of agent-based modeling. In the simulation-based approach, the analyst describes an environment procedurally and then computes payoffs by simulation of agent interactions in that environment. I use simulation-based game theory to study a model of credit network formation. Credit networks represent trust relationships in a directed graph and have been proposed as a mechanism for distributed transactions without a central currency. I explore what information is important when agents make initial decisions of whom to trust, and what sorts of networks can result from their decisions. This setting demonstrates both the value of simulation-based game theory—extending game-theoretic analysis beyond analytically tractable models—and its limitations—simulations produce prodigious amounts of data, and the number of simulations grows exponentially in the number of agents and strategies. I propose several techniques for approximate analysis of simulation-based games with large numbers of agents and large amounts of simulation data. First, I show how bootstrap-based statistics can be used to estimate confidence bounds on the results of simulation-based game analysis. I show that bootstrap confidence intervals for regret of approximate equilibria are well-calibrated. Next, I describe deviation-preserving reduction, which approximates an environment with a large number of agents using a game model with a small number of players, and demonstrate that it outperforms previous player reductions on several measures. Finally, I employ machine learning to construct game models from sparse data sets, and provide evidence that learned game models can produce even better approximate equilibria in large games than deviation-preserving reduction.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113587/1/btwied_1.pd

    On the analysis of stochastic optimization and variational inequality problems

    Get PDF
    Uncertainty has a tremendous impact on decision making. The more connected we get, it seems, the more sources of uncertainty we unfold. For example, uncertainty in the parameters of price and cost functions in power, transportation, communication and financial systems have stemmed from the way these networked systems operate and also how they interact with one another. Uncertainty influences the design, regulation and decisions of participants in several engineered systems like the financial markets, electricity markets, commodity markets, wired and wireless networks, all of which are ubiquitous. This poses many interesting questions in areas of understanding uncertainty (modeling) and dealing with uncertainty (decision making). This dissertation focuses on answering a set of fundamental questions that pertain to dealing with uncertainty arising in three major problem classes: [(1)] Convex Nash games; [(2)] Variational inequality problems and complementarity problems; [(3)] Hierarchical risk management problems in financial networks. Accordingly, this dissertation considers the analysis of a broad class of stochastic optimization and variational inequality problems complicated by uncertainty and nonsmoothness of objective functions. Nash games and variational inequalities have assumed practical relevance in industry and business settings because they are natural models for many real-world applications. Nash games arise naturally in modeling a range of equilibrium problems in power markets, communication networks, market-based allocation of resources etc. where as variational inequality problems allow for modeling frictional contact problems, traffic equilibrium problems etc. Incorporating uncertainty into convex Nash games leads us to stochastic Nash games. Despite the relevance of stochastic generalizations of Nash games and variational inequalities, answering fundamental questions regarding existence of equilibria in stochastic regimes has proved to be a challenge. Amongst other reasons, the main challenge arises from the nonlinearity arising from the presence of the expectation operator. Despite the rich literature in deterministic settings, direct application of deterministic results to stochastic regimes is not straightforward. The first part of this dissertation explores such fundamental questions in stochastic Nash games and variational inequality problems. Instead of directly using the deterministic results, by leveraging Lebesgue convergence theorems we are able to develop a tractable framework for analyzing problems in stochastic regimes over a continuous probability space. The benefit of this approach is that the framework does not rely on evaluation of the expectation operator to provide existence guarantees, thus making it amenable to tractable use. We extend the above framework to incorporate nonsmoothness of payoff functions as well as allow for stochastic constraints in models, all of which are important in practical settings. The second part of this dissertation extends the above framework to generalizations of variational inequality problems and complementarity problems. In particular, we develop a set of almost-sure sufficiency conditions for stochastic variational inequality problems with single-valued and multi-valued mappings. We extend these statements to quasi-variational regimes as well as to stochastic complementarity problems. The applicability of these results is demonstrated in analysis of risk-averse stochastic Nash games used in Nash-Cournot production distribution models in power markets by recasting the problem as a stochastic quasi-variational inequality problem and in Nash-Cournot games with piecewise smooth price functions by modeling this problem as a stochastic complementarity problem. The third part of this dissertation pertains to hierarchical problems in financial risk management. In the financial industry, risk has been traditionally managed by the imposition of value-at-risk or VaR constraints on portfolio risk exposure. Motivated by recent events in the financial industry, we examine the role that risk-seeking traders play in the accumulation of large and possibly infinite risk. We proceed to show that when traders employ a conditional value-at-risk (CVaR) metric, much can be said by studying the interaction between value at risk (VaR) (a non-coherent risk measure) and conditional value at risk CVaR (a coherent risk measure based on VaR). Resolving this question requires characterizing the optimal value of the associated stochastic, and possibly nonconvex, optimization problem, often a challenging problem. Our study makes two sets of contributions. First, under general asset distributions on a compact support, traders accumulate finite risk with magnitude of the order of the upper bound of this support. Second, when the supports are unbounded, under relatively mild assumptions, such traders can take on an unbounded amount of risk despite abiding by this VaR threshold. In short, VaR thresholds may be inadequate in guarding against financial ruin
    corecore