188 research outputs found
Recommended from our members
Sequential Modelling and Inference of High-frequency Limit Order Book with State-space Models and Monte Carlo Algorithms
The high-frequency limit order book (LOB) market has recently attracted increasing research attention from both the industry and the academia as a result of expanding algorithmic trading. However, the massive data throughput and the inherent complexity of high-frequency market dynamics also present challenges to some classic statistical modelling approaches. By adopting powerful state-space models from the field of signal processing as well as a number of Bayesian inference algorithms such as particle filtering, Markov chain Monte Carlo and variational inference algorithms, this thesis presents my extensive research into the high-frequency limit order book covering a wide scope of topics.
Chapter 2 presents a novel construction of the non-homogeneous Poisson process to allow online intensity inference of limit order transactions arriving at a central exchange as point data. Chapter 3 extends a baseline jump diffusion model for market fair-price process to include three additional model features taken from real-world market intuitions. In Chapter 4, another price model is developed to account for both long-term and short-term diffusion behaviours of the price process. This is achieved by incorporating multiple jump-diffusion processes each exhibiting a unique characteristic. Chapter 5 observes the multi-regime nature of price diffusion processes as well as the non-Markovian switching behaviour between regimes. As such, a novel model is proposed which combines the continuous-time state-space model, the hidden semi-Markov switching model and the non-parametric Dirichlet process model. Additionally, building upon the general structure of the particle Markov chain Monte Carlo algorithm, I further propose an algorithm which achieves sequential state inference, regime identification and regime parameters learning requiring minimal prior assumptions. Chapter 6 focuses on the development of efficient parameter-learning algorithms for state-space models and presents three algorithms each demonstrating promising results in comparison to some well-established methods.
The models and algorithms proposed in this thesis not only are practical tools for analysing high-frequency LOB markets, but can also be applied in various areas and disciplines beyond finance
Recommended from our members
Sequential Inference Methods for Non-Homogeneous Poisson Processes with State-Space Prior
The non-homogeneous Poisson process provides a generalised framework for the modelling of random point data by allowing the intensity of point generation to vary across its domain of interest (time or space). The use of non-homogeneous Poisson processes have arisen in many areas of signal processing and machine learning, but application is still largely limited by its intractable likelihood function and the lack of computationally efficient inference schemes, although some methods do exist for the batch data case. In this paper, we propose for the first time a sequential framework for intensity inference which combines the non-homogeneous model of Poisson data with continuous-time state-space models for their time-varying intensity. This approach enables us to design efficient online inference schemes, for which we propose a novel sequential Markov chain Monte Carlo (SMCMC) algorithm, as is demanded by many applications where point data arrive sequentially and decisions need to be made with low latency. The proposed approach is compared with competing methods on synthetic datasets and tested with high-frequency financial order book data, showing in general improved performance and better computational efficiency than the main batch-based competitor algorithm, and better performance than a simple baseline kernel estimation scheme
Recommended from our members
Hidden states, hidden structures: Bayesian learning in time series models
This thesis presents methods for the inference of system state and the learning of model structure for a number of hidden-state time series models, within a Bayesian probabilistic framework. Motivating examples are taken from application areas including finance, physical object tracking and audio restoration. The work in this thesis can be broadly divided into three themes: system and parameter estimation in linear jump-diffusion systems, non-parametric model (system) estimation and batch audio restoration.
For linear jump-diffusion systems, efficient state estimation methods based on the variable rate particle filter are presented for the general linear case (chapter 3) and a new method of parameter estimation based on Particle MCMC methods is introduced and tested against an alternative method using reversible-jump MCMC (chapter 4).
Non-parametric model estimation is examined in two settings: the estimation of non-parametric environment models in a SLAM-style problem, and the estimation of the network structure and forms of linkage between multiple objects. In the former case, a non-parametric Gaussian process prior model is used to learn a potential field model of the environment in which a target moves. Efficient solution methods based on Rao-Blackwellized particle filters are given (chapter 5). In the latter case, a new way of learning non-linear inter-object relationships in multi-object systems is developed, allowing complicated inter-object dynamics to be learnt and causality between objects to be inferred. Again based on Gaussian process prior assumptions, the method allows the identification of a wide range of relationships between objects with minimal assumptions and admits efficient solution, albeit in batch form at present (chapter 6).
Finally, the thesis presents some new results in the restoration of audio signals, in particular the removal of impulse noise (pops and clicks) from audio recordings (chapter 7)This work was supported by the Engineering and Physical Sciences Research Council (EPSRC
Leveraging large-deviation statistics to decipher the stochastic properties of measured trajectories
Extensive time-series encoding the position of particles such as viruses, vesicles, or individual proteins are routinely garnered in single-particle tracking experiments or supercomputing studies. They contain vital clues on how viruses spread or drugs may be delivered in biological cells. Similar time-series are being recorded of stock values in financial markets and of climate data. Such time-series are most typically evaluated in terms of time-averaged mean-squared displacements (TAMSDs), which remain random variables for finite measurement times. Their statistical properties are different for different physical stochastic processes, thus allowing us to extract valuable information on the stochastic process itself. To exploit the full potential of the statistical information encoded in measured time-series we here propose an easy-to-implement and computationally inexpensive new methodology, based on deviations of the TAMSD from its ensemble average counterpart. Specifically, we use the upper bound of these deviations for Brownian motion (BM) to check the applicability of this approach to simulated and real data sets. By comparing the probability of deviations for different data sets, we demonstrate how the theoretical bound for BM reveals additional information about observed stochastic processes. We apply the large-deviation method to data sets of tracer beads tracked in aqueous solution, tracer beads measured in mucin hydrogels, and of geographic surface temperature anomalies. Our analysis shows how the large-deviation properties can be efficiently used as a simple yet effective routine test to reject the BM hypothesis and unveil relevant information on statistical properties such as ergodicity breaking and short-time correlations. Video Abstract Video Abstract: Leveraging large-deviation statistics to decipher the stochastic properties of measured trajectorie
Recommended from our members
Exploring Probability Measures with Markov Processes
In many domains where mathematical modelling is applied, a deterministic description of the system at hand is insufficient, and so it is useful to model systems as being in some way stochastic. This is often achieved by modeling the state of the system as being drawn from a probability measure, which is usually given algebraically, i.e. as a formula. While this representation can be useful for deriving certain characteristics of the system, it is by now well-appreciated that many questions about stochastic systems are best-answered by looking at samples from the associated probability measure. In this thesis, we seek to develop and analyse efficient techniques for generating samples from a given probability measure, with a focus on algorithms which simulate a Markov process with the desired invariant measure.
The first work presented in this thesis considers the use of Piecewise-Deterministic Markov Processes (PDMPs) for generating samples. In contrast to usual approaches, PDMPs are i) defined as continuous-time processes, and ii) are typically non-reversible with respect to their invariant measure. These distinctions pose computational and theoretical challenges for the design, analysis, and implementation of PDMP-based samplers. The key contribution of this work is to develop a transparent characterisation of how one can construct a PDMP (within the class of trajectorially-reversible processes) which admits the desired invariant measure, and to offer actionable recommendations on how these processes should be designed in practice.
The second work presented in this thesis considers the task of sampling from a probability measure on a discrete space. While work in recent years has made it possible to apply sampling algorithms to probability measures with differentiable densities on continuous spaces in a reasonably generic way, samplers on discrete spaces are still largely derived on a case-by-case basis. The contention of this work is that this is not necessary, and that one can in fact define quite generally-applicable algorithms which can sample efficiently from discrete probability measures. The contributions are then to propose a small collection of algorithms for this task, and verify their efficiency empirically. Building
on the previous chapter’s work, our samplers are again defined in continuous time and non-reversible, each of which offer noticeable benefits in efficiency.
The third work presented in this thesis concerns a theoretical study of a particular class of Markov Chain-based sampling algorithms which make use of parallel computing resources. The Markov Chains which are produced by this algorithm are mathematically equivalent to a standard Metropolis-Hastings chain, but their real-time convergence properties are affected nontrivially by the application of parallelism. The contribution of this work is to analyse the convergence behaviour of these chains, and to use the ‘optimal scaling’ framework (as developed by Roberts, Rosenthal, and others) to make recommendations concerning the tuning of such algorithms in practice.
The introductory chapters provide a general overview on the task of generating samples from a probability measure, with particular focus on methods involving Markov processes. There is also an interlude on the relative benefits of i) continuous-time and ii) non-reversible Markov processes for sampling, which are intended to provide additional context for the reading of the first two works.PhD Studentship paid for by Cantab Capital Institute for the Mathematics of Informatio
Computational methods for various stochastic differential equation models in finance
This study develops efficient numerical methods for solving jumpdiffusion stochastic delay differential equations and stochastic differential equations with fractional order. In addition, two novel algorithms are developed for the estimation of parameters in the stochastic models. One of the algorithms is based on the implementation of the Bayesian inference and the Markov Chain Monte Carlo method, while the other one is developed by using an implicit numerical scheme integrated with the particle swarm optimization
- …