14 research outputs found
Ordinal Potential-based Player Rating
A two-player symmetric zero-sum game is transitive if for any pure strategies
, , , if is better than , and is better than , then
is better than . It was recently observed that the Elo rating fails at
preserving transitive relations among strategies and therefore cannot correctly
extract the transitive component of a game. Our first contribution is to show
that the Elo rating actually does preserve transitivity when computed in the
right space. Precisely, using a suitable invertible mapping , we first
apply to the game, then compute Elo ratings, then go back to the
original space by applying . We provide a characterization of
transitive games as a weak variant of ordinal potential games with additively
separable potential functions. Leveraging this insight, we introduce the
concept of transitivity order, the minimum number of invertible mappings
required to transform the payoff of a transitive game into (differences of) its
potential function. The transitivity order is a tool to classify transitive
games, with Elo games being an example of transitive games of order one. Most
real-world games have both transitive and non-transitive (cyclic) components,
and we use our analysis of transitivity to extract the transitive (potential)
component of an arbitrary game. We link transitivity to the known concept of
sign-rank: transitive games have sign-rank two; arbitrary games may have higher
sign-rank. Using a neural network-based architecture, we learn a decomposition
of an arbitrary game into transitive and cyclic components that prioritises
capturing the sign pattern of the game. In particular, a transitive game always
has just one component in its decomposition, the potential component. We
provide a comprehensive evaluation of our methodology using both toy examples
and empirical data from real-world games
Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures
Cheung and Piliouras (2020) recently showed that two variants of the
Multiplicative Weights Update method - OMWU and MWU - display opposite
convergence properties depending on whether the game is zero-sum or
cooperative. Inspired by this work and the recent literature on learning to
optimize for single functions, we introduce a new framework for learning
last-iterate convergence to Nash Equilibria in games, where the update rule's
coefficients (learning rates) along a trajectory are learnt by a reinforcement
learning policy that is conditioned on the nature of the game: \textit{the game
signature}. We construct the latter using a new decomposition of two-player
games into eight components corresponding to commutative projection operators,
generalizing and unifying recent game concepts studied in the literature. We
compare the performance of various update rules when their coefficients are
learnt, and show that the RL policy is able to exploit the game signature
across a wide range of game types. In doing so, we introduce CMWU, a new
algorithm that extends consensus optimization to the constrained case, has
local convergence guarantees for zero-sum bimatrix games, and show that it
enjoys competitive performance on both zero-sum games with constant
coefficients and across a spectrum of games when its coefficients are learnt
Calibration of Derivative Pricing Models: a Multi-Agent Reinforcement Learning Perspective
One of the most fundamental questions in quantitative finance is the
existence of continuous-time diffusion models that fit market prices of a given
set of options. Traditionally, one employs a mix of intuition, theoretical and
empirical analysis to find models that achieve exact or approximate fits. Our
contribution is to show how a suitable game theoretical formulation of this
problem can help solve this question by leveraging existing developments in
modern deep multi-agent reinforcement learning to search in the space of
stochastic processes. More importantly, we hope that our techniques can be
leveraged and extended by the community to solve important problems in that
field, such as the joint SPX-VIX calibration problem. Our experiments show that
we are able to learn local volatility, as well as path-dependence required in
the volatility process to minimize the price of a Bermudan option. In one
sentence, our algorithm can be seen as a particle method \`{a} la Guyon et
Henry-Labordere where particles, instead of being designed to ensure
, are learning RL-driven
agents cooperating towards more general calibration targets. This is the first
work bridging reinforcement learning with the derivative calibration problem
Semi-Markov Driven Models: Limit Theorems and Financial Applications
This thesis deals with models driven by so-called semi-Markov processes, and studies some limit theorems and financial applications in this context. Given a system whose dynamics are governed by various regimes, a semi-Markov process is simply a process that ``keeps track" of the system regime at each time. It becomes fully Markovian if we ``add" to it the process keeping track of how long the system has been in its current regime.
Chapter 1 consists of a global introduction to the thesis. We introduce the concepts of semi-Markov and Markov renewal processes, and give a brief overview of each chapter, together with the main results obtained.
Chapter 2 introduces a semi-Markovian model of high frequency price dynamics: as suggested by empirical observations, it extends recent results to arbitrary distributions for limit order book events inter-arrival times, and both the nature of a new limit order book event and its corresponding inter-arrival time depend on the nature of the previous limit order book event.
Chapter 3 establishes strong law of large numbers and central limit theorem results for time-inhomogeneous semi-Markov processes, for which the kernel is time-dependent.
Chapter 4 develops a rigorous treatment of so-called inhomogeneous semi-Markov driven random evolutions, and extends already existing results related to the time-homogeneous case. Random evolutions allow to model a situation in which the dynamics of a system are governed by various regimes, and the system switches from one regime to another at random times. This phenomenon will be modeled by using semi-Markov processes. The notion of ``time-inhomogeneity" appears twice in our framework: random evolutions will be driven by inhomogeneous semi-Markov processes (using results from chapter 3), and constructed with propagators, which are time-inhomogeneous counterparts of semigroups.
Chapter 5 presents a drift-adjusted version of the well-known Heston model - the delayed Heston model - which allows us to improve the implied volatility surface fitting. Pricing and hedging of variance and volatility swaps is also considered.
Finally, chapter 6 concludes the thesis and presents some possible future research directions