14 research outputs found

    Ordinal Potential-based Player Rating

    Full text link
    A two-player symmetric zero-sum game is transitive if for any pure strategies xx, yy, zz, if xx is better than yy, and yy is better than zz, then xx is better than zz. It was recently observed that the Elo rating fails at preserving transitive relations among strategies and therefore cannot correctly extract the transitive component of a game. Our first contribution is to show that the Elo rating actually does preserve transitivity when computed in the right space. Precisely, using a suitable invertible mapping φ\varphi, we first apply φ\varphi to the game, then compute Elo ratings, then go back to the original space by applying φ−1\varphi^{-1}. We provide a characterization of transitive games as a weak variant of ordinal potential games with additively separable potential functions. Leveraging this insight, we introduce the concept of transitivity order, the minimum number of invertible mappings required to transform the payoff of a transitive game into (differences of) its potential function. The transitivity order is a tool to classify transitive games, with Elo games being an example of transitive games of order one. Most real-world games have both transitive and non-transitive (cyclic) components, and we use our analysis of transitivity to extract the transitive (potential) component of an arbitrary game. We link transitivity to the known concept of sign-rank: transitive games have sign-rank two; arbitrary games may have higher sign-rank. Using a neural network-based architecture, we learn a decomposition of an arbitrary game into transitive and cyclic components that prioritises capturing the sign pattern of the game. In particular, a transitive game always has just one component in its decomposition, the potential component. We provide a comprehensive evaluation of our methodology using both toy examples and empirical data from real-world games

    Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

    Full text link
    Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt

    Calibration of Derivative Pricing Models: a Multi-Agent Reinforcement Learning Perspective

    Full text link
    One of the most fundamental questions in quantitative finance is the existence of continuous-time diffusion models that fit market prices of a given set of options. Traditionally, one employs a mix of intuition, theoretical and empirical analysis to find models that achieve exact or approximate fits. Our contribution is to show how a suitable game theoretical formulation of this problem can help solve this question by leveraging existing developments in modern deep multi-agent reinforcement learning to search in the space of stochastic processes. More importantly, we hope that our techniques can be leveraged and extended by the community to solve important problems in that field, such as the joint SPX-VIX calibration problem. Our experiments show that we are able to learn local volatility, as well as path-dependence required in the volatility process to minimize the price of a Bermudan option. In one sentence, our algorithm can be seen as a particle method \`{a} la Guyon et Henry-Labordere where particles, instead of being designed to ensure σloc(t,St)2=E[σt2∣St]\sigma_{loc}(t,S_t)^2 = \mathbb{E}[\sigma_t^2|S_t], are learning RL-driven agents cooperating towards more general calibration targets. This is the first work bridging reinforcement learning with the derivative calibration problem

    Semi-Markov Driven Models: Limit Theorems and Financial Applications

    No full text
    This thesis deals with models driven by so-called semi-Markov processes, and studies some limit theorems and financial applications in this context. Given a system whose dynamics are governed by various regimes, a semi-Markov process is simply a process that ``keeps track" of the system regime at each time. It becomes fully Markovian if we ``add" to it the process keeping track of how long the system has been in its current regime. Chapter 1 consists of a global introduction to the thesis. We introduce the concepts of semi-Markov and Markov renewal processes, and give a brief overview of each chapter, together with the main results obtained. Chapter 2 introduces a semi-Markovian model of high frequency price dynamics: as suggested by empirical observations, it extends recent results to arbitrary distributions for limit order book events inter-arrival times, and both the nature of a new limit order book event and its corresponding inter-arrival time depend on the nature of the previous limit order book event. Chapter 3 establishes strong law of large numbers and central limit theorem results for time-inhomogeneous semi-Markov processes, for which the kernel is time-dependent. Chapter 4 develops a rigorous treatment of so-called inhomogeneous semi-Markov driven random evolutions, and extends already existing results related to the time-homogeneous case. Random evolutions allow to model a situation in which the dynamics of a system are governed by various regimes, and the system switches from one regime to another at random times. This phenomenon will be modeled by using semi-Markov processes. The notion of ``time-inhomogeneity" appears twice in our framework: random evolutions will be driven by inhomogeneous semi-Markov processes (using results from chapter 3), and constructed with propagators, which are time-inhomogeneous counterparts of semigroups. Chapter 5 presents a drift-adjusted version of the well-known Heston model - the delayed Heston model - which allows us to improve the implied volatility surface fitting. Pricing and hedging of variance and volatility swaps is also considered. Finally, chapter 6 concludes the thesis and presents some possible future research directions
    corecore