404 research outputs found

    Efficiently Characterizing Games Consistent with Perturbed Equilibrium Observations

    Get PDF
    In this thesis, we study the problem of characterizing the set of games that are consistent with observed equilibrium play, a fundamental problem in econometrics. Our contribution is to develop and analyze a new methodology based on convex optimization to address this problem, for many classes of games and observation models of interest. Our approach provides a sharp, computationally efficient characterization of the extent to which a particular set of observations constrains the space of games that could have generated them. This allows us to solve a number of variants of this problem as well as to quantify the power of games from particular classes (e.g., zero-sum, potential, linearly parameterized) to explain player behavior. We illustrate our approach with numerical simulations.</p

    Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

    Full text link
    Optimizing strategic decisions (a.k.a. computing equilibrium) is key to the success of many non-cooperative multi-agent applications. However, in many real-world situations, we may face the exact opposite of this game-theoretic problem -- instead of prescribing equilibrium of a given game, we may directly observe the agents' equilibrium behaviors but want to infer the underlying parameters of an unknown game. This research question, also known as inverse game theory, has been studied in multiple recent works in the context of Stackelberg games. Unfortunately, existing works exhibit quite negative results, showing statistical hardness and computational hardness, assuming follower's perfectly rational behaviors. Our work relaxes the perfect rationality agent assumption to the classic quantal response model, a more realistic behavior model of bounded rationality. Interestingly, we show that the smooth property brought by such bounded rationality model actually leads to provably more efficient learning of the follower utility parameters in general Stackelberg games. Systematic empirical experiments on synthesized games confirm our theoretical results and further suggest its robustness beyond the strict quantal response model

    Discovering How Agents Learn Using Few Data

    Full text link
    Decentralized learning algorithms are an essential tool for designing multi-agent systems, as they enable agents to autonomously learn from their experience and past interactions. In this work, we propose a theoretical and algorithmic framework for real-time identification of the learning dynamics that govern agent behavior using a short burst of a single system trajectory. Our method identifies agent dynamics through polynomial regression, where we compensate for limited data by incorporating side-information constraints that capture fundamental assumptions or expectations about agent behavior. These constraints are enforced computationally using sum-of-squares optimization, leading to a hierarchy of increasingly better approximations of the true agent dynamics. Extensive experiments demonstrated that our approach, using only 5 samples from a short run of a single trajectory, accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lyapunov times. These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems

    Continuous Limits for Constrained Ensemble Kalman Filter

    Full text link
    The Ensemble Kalman Filter method can be used as an iterative particle numerical scheme for state dynamics estimation and control--to--observable identification problems. In applications it may be required to enforce the solution to satisfy equality constraints on the control space. In this work we deal with this problem from a constrained optimization point of view, deriving corresponding optimality conditions. Continuous limits, in time and in the number of particles, allows us to study properties of the method. We illustrate the performance of the method by using test inverse problems from the literature

    Mapping the landscape of metabolic goals of a cell

    Get PDF
    Genome-scale flux balance models of metabolism provide testable predictions of all metabolic rates in an organism, by assuming that the cell is optimizing a metabolic goal known as the objective function. We introduce an efficient inverse flux balance analysis (invFBA) approach, based on linear programming duality, to characterize the space of possible objective functions compatible with measured fluxes. After testing our algorithm on simulated E. coli data and time-dependent S. oneidensis fluxes inferred from gene expression, we apply our inverse approach to flux measurements in long-term evolved E. coli strains, revealing objective functions that provide insight into metabolic adaptation trajectories.MURI W911NF-12-1-0390 - Army Research Office (US); MURI W911NF-12-1-0390 - Army Research Office (US); 5R01GM089978-02 - National Institutes of Health (US); IIS-1237022 - National Science Foundation (US); DE-SC0012627 - U.S. Department of Energy; HR0011-15-C-0091 - Defense Sciences Office, DARPA; National Institutes of Health; R01GM103502; 5R01DE024468; 1457695 - National Science Foundatio

    Recovering the Sunk Costs of R&D: the Moulds Industry Case

    Get PDF
    Sunk costs for R&D are an important determinant of the level of innovation in the economy. In this paper I recover them using a Markov equilibrium framework. The contribution is twofold. First, a model of industry dynamics which accounts for selection into R&D, capital accumulation and entry/exit is proposed. The industry state is summarized by an aggregate state with the advantage that it avoids the "curse of dimensionality". Second, the estimated sunk costs of R&D for the Portuguese moulds industry are shown to be important (3.4 million Euros). They become particularly relevant since the industry is mostly populated by small firms. Institutional changes in the early 1990s generated an increase in demand from European car makers and created the incentives for firms to pay the costs of investment. Trade-induced innovation reinforced the selection effect by which international trade leads to productivity growth. Finally, using the estimated parameters, simulations evaluate the effects of changes in market size, sunk costs and entry costs.Aggregate state, industry dynamics, Markov equilibrium, moulds industry, R&D, structural estimation, sunk costs

    Scalable Online Learning of Approximate Stackelberg Solutions in Energy Trading Games with Demand Response Aggregators

    Full text link
    In this work, a Stackelberg game theoretic framework is proposed for trading energy bidirectionally between the demand-response (DR) aggregator and the prosumers. This formulation allows for flexible energy arbitrage and additional monetary rewards while ensuring that the prosumers' desired daily energy demand is met. Then, a scalable (with the number of prosumers) approach is proposed to find approximate equilibria based on online sampling and learning of the prosumers' cumulative best response. Moreover, bounds are provided on the quality of the approximate equilibrium solution. Last, real-world data from the California day-ahead energy market and the University of California at Davis building energy demands are utilized to demonstrate the efficacy of the proposed framework and the online scalable solution.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

    Full text link
    Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), H\mathcal{H}_\infty control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.Comment: To Appear in Annual Review of Control, Robotics, and Autonomous System

    Dynamics in near-potential games

    Get PDF
    We consider discrete-time learning dynamics in finite strategic form games, and show that games that are close to a potential game inherit many of the dynamical properties of potential games. We first study the evolution of the sequence of pure strategy profiles under better/best response dynamics. We show that this sequence converges to a (pure) approximate equilibrium set whose size is a function of the “distance” to a given nearby potential game. We then focus on logit response dynamics, and provide a characterization of the limiting outcome in terms of the distance of the game to a given potential game and the corresponding potential function. Finally, we turn attention to fictitious play, and establish that in near-potential games the sequence of empirical frequencies of player actions converges to a neighborhood of (mixed) equilibria, where the size of the neighborhood increases according to the distance to the set of potential games
    corecore