386 research outputs found

    Max-plus fundamental solution semigroups for a class of difference Riccati equations

    Get PDF
    Recently, a max-plus dual space fundamental solution semigroup for a class of difference Riccati equation (DRE) has been developed. This fundamental solution semigroup is represented in terms of the kernel of a specific max-plus linear operator that plays the role of the dynamic programming evolution operator in a max-plus dual space. In order to fully understand connections between this dual space fundamental solution semigroup and evolution of the value function of the underlying optimal control problem, a new max-plus primal space fundamental solution semigroup for the same class of difference Riccati equations is presented. Connections and commutation results between this new primal space fundamental solution semigroup and the recently developed dual space fundamental solution semigroup are established.Comment: 17 pages, 3 figure

    Analysis of Delays in Networked Flight Simulators

    Get PDF
    Electrical Engineerin

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N≫2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Stochastic control in limit order markets

    Get PDF
    In dieser Dissertation lösen wir eine Klasse stochastischer Kontrollprobleme und konstruieren optimale Handelsstrategien in illiquiden MĂ€rkten. In Kapitel 1 betrachten wir einen Investor, der sein Portfolio nahe an einer stochastischen Zielfunktion halten möchte. Gesucht ist eine Strategie (aus aktiven und passiven Orders), die die Abweichung vom Zielportfolio und die Handelskosten minimiert. Wir zeigen Existenz und Eindeutigkeit einer optimalen Strategie. Wir beweisen eine Version des stochastischen Maximumprinzips und leiten damit ein Kriterium fĂŒr OptimalitĂ€t mittels einer gekoppelten FBSDE her. Wir beweisen eine zweite Charakterisierung mittels Kauf- und Verkaufregionen. Das Portfolioliquidierungsproblem wird explizit gelöst. In Kapitel 2 verallgemeinern wir die Klasse der zulĂ€ssigen Strategien auf singulĂ€re Marktorders. Wie zuvor zeigen wir Existenz und Eindeutigkeit einer optimalen Strategie. Im zweiten Schritt beweisen wir eine Version des Maximumprinzips im singulĂ€ren Fall, die eine notwendige und hinreichende OptimalitĂ€tsbedingung liefert. Daraus leiten wir eine weitere Charakterisierung mittels Kauf-, Verkaufs- und Nichthandelsregionen ab. Wir zeigen, dass Marktorders nur benutzt werden, wenn der Spread klein genug ist. Wir schließen dieses Kapitel mit einer Fallstudie ĂŒber Portfolioliquidierung ab. Das dritte Kapitel thematisiert Marktmanipulation in illiquiden MĂ€rkten. Wenn Transaktionen einen Einfluß auf den Aktienpreis haben, dann können Optionsbesitzer damit den Wert ihres Portfolios beeinflussen. Wir betrachten mehrere Agenten, die europĂ€ische Derivate halten und den Preis des zugrundeliegenden Wertpapiers beeinflussen. Wir beschrĂ€nken uns auf risikoneutrale und CARA-Investoren und zeigen die Existenz eines eindeutigen Gleichgewichts, das wir mittels eines gekoppelten Systems nichtlinearer PDEs charakterisieren. Abschließend geben wir Bedingungen an, wie diese Art von Marktmanipulation verhindert werden kann.In this thesis we study a class of stochastic control problems and analyse optimal trading strategies in limit order markets. The first chapter addresses the problem of curve following. We consider an investor who wants to keep his stock holdings close to a stochastic target function. We construct the optimal strategy (comprising market and passive orders) which balances the penalty for deviating and the cost of trading. We first prove existence and uniqueness of an optimal control. The optimal trading strategy is then characterised in terms of the solution to a coupled FBSDE involving jumps via a stochastic maximum principle. We give a second characterisation in terms of buy and sell regions. The application of portfolio liquidation is studied in detail. In the second chapter, we extend our results to singular market orders using techniques of singular stochastic control. We first show existence and uniqueness of an optimal control. We then derive a version of the stochastic maximum principle which yields a characterisation of the optimal trading strategy in terms of a nonstandard coupled FBSDE. We show that the optimal control can be characterised via buy, sell and no-trade regions. We describe precisely when it is optimal to cross the bid ask spread. We also show that the controlled system can be described in terms of a reflected BSDE. As an application, we solve the portfolio liquidation problem with passive orders. When markets are illiquid, option holders may have an incentive to increase their portfolio value by using their impact on the dynamics of the underlying. In Chapter 3, we consider a model with competing players that hold European options and whose trading has an impact on the price of the underlying. We establish existence and uniqueness of equilibrium results and show that the equilibrium dynamics can be characterised in terms of a coupled system of non-linear PDEs. Finally, we show how market manipulation can be reduced
    • 

    corecore