research

Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution

Abstract

Analyses of serially-sampled data often begin with the assumption that the observations represent discrete samples from a latent continuous-time stochastic process. The continuous-time Markov chain (CTMC) is one such generative model whose popularity extends to a variety of disciplines ranging from computational finance to human genetics and genomics. A common theme among these diverse applications is the need to simulate sample paths of a CTMC conditional on realized data that is discretely observed. Here we present a general solution to this sampling problem when the CTMC is defined on a discrete and finite state space. Specifically, we consider the generation of sample paths, including intermediate states and times of transition, from a CTMC whose beginning and ending states are known across a time interval of length TT. We first unify the literature through a discussion of the three predominant approaches: (1) modified rejection sampling, (2) direct sampling, and (3) uniformization. We then give analytical results for the complexity and efficiency of each method in terms of the instantaneous transition rate matrix QQ of the CTMC, its beginning and ending states, and the length of sampling time TT. In doing so, we show that no method dominates the others across all model specifications, and we give explicit proof of which method prevails for any given Q,T,Q,T, and endpoints. Finally, we introduce and compare three applications of CTMCs to demonstrate the pitfalls of choosing an inefficient sampler.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS247 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 01/04/2019