11 research outputs found
Essays on Numerical Integration in Hamiltonian Monte Carlo
This thesis considers a variety of topics broadly unified under the theme of geometric integration for Riemannian manifold Hamiltonian Monte Carlo. In chapter 2, we review fundamental topics in numerical computing (section 2.1), classical mechanics (section 2.2), integration on manifolds (section 2.3), Riemannian geometry (section 2.5), stochastic differential equations (section 2.4), information geometry (section 2.6), and Markov chain Monte Carlo (section 2.7). The purpose of these sections is to present the topics discussed in the thesis within a broader context. The subsequent chapters are self-contained to an extent, but contain references back to this foundational material where appropriate. Chapter 3 gives a formal means of conceptualizing the Markov chains corresponding to Riemannian manifold Hamiltonian Monte Carlo and related methods; this formalism is useful for understanding the significance of reversibility and volume-preservation for maintaining detailed balance in Markov chain Monte Carlo. Throughout the remainder of the thesis, we investigate alternative methods of geometric numerical integration for use in Riemannian manifold Hamiltonian Monte Carlo, discuss numerical issues involving violations of reversibility and detailed balance, and propose new algorithms with superior theoretical foundations. In chapter 4, we evaluate the implicit midpoint integrator for Riemannian manifold Hamiltonian Monte Carlo, presenting the first time that this integrator has been deployed and assessed within this context. We discuss attributes of the implicit midpoint integrator that make it preferable, and inferior, to alternative methods of geometric integration such as the generalized leapfrog procedure. In chapter 5, we treat an empirical question as to what extent convergence thresholds play a role in geometric numerical integration in Riemannian manifold Hamiltonian Monte Carlo. If the convergence threshold is too large, then the Markov chain transition kernel will fail to maintain detailed balance, whereas a convergence threshold that is very small will incur computational penalties. We investigate these phenomena and suggest two mechanisms, based on stochastic approximation and higher-order solvers for non-linear equations, which can aid in identifying convergence thresholds or suppress its significance. In chapter 6, we consider a numerical integrator for Markov chain Monte Carlo based on the Lagrangian, rather than Hamiltonian, formalism in classical mechanics. Our contributions include clarifying the order of accuracy of this numerical integrator, which has been misunderstood in the literature, and evaluating a simple change that can accelerate the implementation of the method, but which comes at the cost of producing more serially auto-correlated samples. We also discuss robustness properties of the Lagrangian numerical method that do not materialize in the Hamiltonian setting. Chapter 7 examines theories of geometric ergodicity for Riemannian manifold Hamiltonian Monte Carlo and Lagrangian Monte Carlo, and proposes a simple modification to these Markov chain methods that enables geometric ergodicity to be inherited from the manifold Metropolis-adjusted Langevin algorithm. In chapter 8, we show how to revise an explicit integration using a theory of Lagrange multipliers so that the resulting numerical method satisfies the properties of reversibility and volume-preservation. Supplementary content in chapter E investigates topics in the theory of shadow Hamiltonians of the implicit midpoint method in the case of non-canonical Hamiltonian mechanics and chapter F, which treats the continual adaptation of a parameterized proposal distribution in the independent Metropolis-Hastings sampler
Summary of research in applied mathematics, numerical analysis, and computer sciences
The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers
Recommended from our members
Exploring Probability Measures with Markov Processes
In many domains where mathematical modelling is applied, a deterministic description of the system at hand is insufficient, and so it is useful to model systems as being in some way stochastic. This is often achieved by modeling the state of the system as being drawn from a probability measure, which is usually given algebraically, i.e. as a formula. While this representation can be useful for deriving certain characteristics of the system, it is by now well-appreciated that many questions about stochastic systems are best-answered by looking at samples from the associated probability measure. In this thesis, we seek to develop and analyse efficient techniques for generating samples from a given probability measure, with a focus on algorithms which simulate a Markov process with the desired invariant measure.
The first work presented in this thesis considers the use of Piecewise-Deterministic Markov Processes (PDMPs) for generating samples. In contrast to usual approaches, PDMPs are i) defined as continuous-time processes, and ii) are typically non-reversible with respect to their invariant measure. These distinctions pose computational and theoretical challenges for the design, analysis, and implementation of PDMP-based samplers. The key contribution of this work is to develop a transparent characterisation of how one can construct a PDMP (within the class of trajectorially-reversible processes) which admits the desired invariant measure, and to offer actionable recommendations on how these processes should be designed in practice.
The second work presented in this thesis considers the task of sampling from a probability measure on a discrete space. While work in recent years has made it possible to apply sampling algorithms to probability measures with differentiable densities on continuous spaces in a reasonably generic way, samplers on discrete spaces are still largely derived on a case-by-case basis. The contention of this work is that this is not necessary, and that one can in fact define quite generally-applicable algorithms which can sample efficiently from discrete probability measures. The contributions are then to propose a small collection of algorithms for this task, and verify their efficiency empirically. Building
on the previous chapter’s work, our samplers are again defined in continuous time and non-reversible, each of which offer noticeable benefits in efficiency.
The third work presented in this thesis concerns a theoretical study of a particular class of Markov Chain-based sampling algorithms which make use of parallel computing resources. The Markov Chains which are produced by this algorithm are mathematically equivalent to a standard Metropolis-Hastings chain, but their real-time convergence properties are affected nontrivially by the application of parallelism. The contribution of this work is to analyse the convergence behaviour of these chains, and to use the ‘optimal scaling’ framework (as developed by Roberts, Rosenthal, and others) to make recommendations concerning the tuning of such algorithms in practice.
The introductory chapters provide a general overview on the task of generating samples from a probability measure, with particular focus on methods involving Markov processes. There is also an interlude on the relative benefits of i) continuous-time and ii) non-reversible Markov processes for sampling, which are intended to provide additional context for the reading of the first two works.PhD Studentship paid for by Cantab Capital Institute for the Mathematics of Informatio
The singular vector approach to the analysis of perturbation growth in the atmosphere
This dissertation applies linear algebra to the study of perturbation growth in atmospheric flows. Maximally-growing perturbations are identified from singular vector analysis of the time-evolving flow. Given a system characterized by a linear propagator L(t0,t), which describes the time evolution of small perturbations x between time t0 and t around a time-evolving trajectory, an inner product (..;..)E on the tangent space of the perturbations defined by a matrix E and its associated norm ||..||E, the problem can be stated as: Find the phase space directions x for which ||x(t)||E/ ||x(t0)||E is maximum, where x(t)=L(t0,t)x(t0) Given the adjoint L*E of the forward propagator L, perturbation growth is gauged by computing the eigenvectors of an operator including L and L*E as factors. The eigenvectors with the largest eigenvalues define the directions with maximum growth. They are called the singular vectors of the tangent forward propagator L. First, the singular vector approach is described. Second, a barotropic model of the atmospheric flow is considered. The impact of underlying orography on singular vectors growing over different time intervals, and the role of singular vectors in explaining the maintenance of blocked flows during winter, are analyzed. Third, a 3-dimensional primitive equation model of the atmospheric flow is considered. Some aspects of the application of the singular vector technique to the study of perturbation growth in the whole atmosphere are analyzed. A physical interpretation of singular vector growth based on the application of the Eliassen-Palm theorem and on WKBJ theory is proposed. Finally, two examples of operational use of singular vectors are presented. Results show how the adjoint technique is a suitable methodology for the identification of atmospheric instabilities, and how it can be used to investigate predictability problems
The bracket geometry of statistics
In this thesis we build a geometric theory of Hamiltonian Monte Carlo, with an emphasis on symmetries and its bracket generalisations, construct the canonical geometry of smooth measures and Stein operators, and derive the complete recipe of measure-constraints preserving dynamics and diffusions on arbitrary manifolds.
Specifically, we will explain the central role played by mechanics with symmetries to obtain efficient numerical integrators, and provide a general method to construct explicit integrators for HMC on geodesic orbit manifolds via symplectic reduction.
Following ideas developed by Maxwell, Volterra, Poincaré, de Rham, Koszul, Dufour, Weinstein, and others,
we will then show that any smooth distribution generates
considerable geometric content, including ``musical"
isomorphisms between multi-vector fields and twisted differential forms, and
a boundary operator - the rotationnel,
which, in particular, engenders the canonical Stein operator.
We then introduce the ``bracket formalism" and its induced mechanics, a generalisation of Poisson mechanics and gradient flows that provides a general mechanism to associate unnormalised probability densities to flows depending on the score pointwise.
Most importantly, we will characterise all measure-constraints preserving flows on arbitrary manifolds, showing the intimate relation between measure-preserving Nambu mechanics and closed twisted forms.
Our results are canonical. As a special case we obtain the characterisation of measure-preserving bracket mechanical systems and measure-preserving diffusions, thus explaining and extending to manifolds
the complete recipe of SGMCMC.
We will discuss the geometry of Stein operators and extend the density approach by showing these are simply a reformulation of the exterior derivative on twisted forms satisfying Stokes' theorem.
Combining the canonical Stein operator with brackets allows us to naturally recover the Riemannian and diffusion Stein operators as special cases.
Finally, we shall introduce the minimum Stein discrepancy estimators, which provide a unifying perspective of parameter inference based on score matching, contrastive divergence, and minimum probability flow.Open Acces
Recommended from our members
Modernizing Markov Chains Monte Carlo for Scientific and Bayesian Modeling
The advent of probabilistic programming languages has galvanized scientists to write increasingly diverse models to analyze data. Probabilistic models use a joint distribution over observed and latent variables to describe at once elaborate scientific theories, non-trivial measurement procedures, information from previous studies, and more. To effectively deploy these models in a data analysis, we need inference procedures which are reliable, flexible, and fast. In a Bayesian analysis, inference boils down to estimating the expectation values and quantiles of the unnormalized posterior distribution. This estimation problem also arises in the study of non-Bayesian probabilistic models, a prominent example being the Ising model of Statistical Physics.
Markov chains Monte Carlo (MCMC) algorithms provide a general-purpose sampling method which can be used to construct sample estimators of moments and quantiles. Despite MCMC’s compelling theory and empirical success, many models continue to frustrate MCMC, as well as other inference strategies, effectively limiting our ability to use these models in a data analysis. These challenges motivate new developments in MCMC. The term “modernize” in the title refers to the deployment of methods which have revolutionized Computational Statistics and Machine Learning in the past decade, including: (i) hardware accelerators to support massive parallelization, (ii) approximate inference based on tractable densities, (iii) high-performance automatic differentiation and (iv) continuous relaxations of discrete systems.
The growing availability of hardware accelerators such as GPUs has in the past years motivated a general MCMC strategy, whereby we run many chains in parallel with a short sampling phase, rather than a few chains with a long sampling phase. Unfortunately existing convergence diagnostics are not designed for the “many short chains” regime. This is notably the case of the popular R statistics which claims convergence only if the effective sample size per chain is large. We present the nested R, denoted nR, a generalization of R which does not conflate short chains and poor mixing, and offers a useful diagnostic provided we run enough chains and meet certain initialization conditions. Combined with nR the short chain regime presents us with the opportunity to identify optimal lengths for the warmup and sampling phases, as well as the optimal number of chains; tuning parameters of MCMC which are otherwise chosen using heuristics or trial-and-error.
We next focus on semi-specialized algorithms for latent Gaussian models, arguably the most widely used of class of hierarchical models. It is well understood that MCMC often struggles with the geometry of the posterior distribution generated by these models. Using a Laplace approximation, we marginalize out the latent Gaussian variables and then integrate the remaining parameters with Hamiltonian Monte Carlo (HMC), a gradient-based MCMC. This approach combines MCMC and a distributional approximation, and offers a useful alternative to pure MCMC or pure approximation methods such as Variational Inference. We compare the three paradigms across a range of general linear models, which admit a sophisticated prior, i.e. a Gaussian process and a Horseshoe prior. To implement our scheme efficiently, we derive a novel automatic differentiation method called the adjoint-differentiated Laplace approximation. This differentiation algorithm propagates the minimal information needed to construct the gradient of the approximate marginal likelihood, and yields a scalable differentiation method that is orders of magnitude faster than state of the art differentiation for high-dimensional hyperparameters. We next discuss the application of our algorithm to models with an unconventional likelihood, going beyond the classical setting of general linear models. This necessitates a non-trivial generalization of the adjoint-differentiated Laplace approximation, which we implement using higher-order adjoint methods. The generalization works out to be both more general and more efficient. We apply the resulting method to an unconventional latent Gaussian model, identifying promising features and highlighting persistent challenges.
The final chapter of this dissertation focuses on a specific but rich problem: the Ising model of Statistical Physics, and its generalization as the Potts and Spin Glass models. These models are challenging because they are discrete, precluding the immediate use of gradient-based algorithms, and exhibit multiple modes, notably at cold temperatures. We propose a new class of MCMC algorithms to draw samples from Potts models by augmenting the target space with a carefully constructed auxiliary Gaussian variable. In contrast to existing methods of a similar flavor, our algorithm can take advantage of the low-rank structure of the coupling matrix and scales linearly with the number of states in a Potts model. The method is applied to a broad range of coupling and temperature regimes and compared to several sampling methods, allowing us to paint a nuanced algorithmic landscape
Ahlfors circle maps and total reality: from Riemann to Rohlin
This is a prejudiced survey on the Ahlfors (extremal) function and the weaker
{\it circle maps} (Garabedian-Schiffer's translation of "Kreisabbildung"), i.e.
those (branched) maps effecting the conformal representation upon the disc of a
{\it compact bordered Riemann surface}. The theory in question has some
well-known intersection with real algebraic geometry, especially Klein's
ortho-symmetric curves via the paradigm of {\it total reality}. This leads to a
gallery of pictures quite pleasant to visit of which we have attempted to trace
the simplest representatives. This drifted us toward some electrodynamic
motions along real circuits of dividing curves perhaps reminiscent of Kepler's
planetary motions along ellipses. The ultimate origin of circle maps is of
course to be traced back to Riemann's Thesis 1851 as well as his 1857 Nachlass.
Apart from an abrupt claim by Teichm\"uller 1941 that everything is to be found
in Klein (what we failed to assess on printed evidence), the pivotal
contribution belongs to Ahlfors 1950 supplying an existence-proof of circle
maps, as well as an analysis of an allied function-theoretic extremal problem.
Works by Yamada 1978--2001, Gouma 1998 and Coppens 2011 suggest sharper degree
controls than available in Ahlfors' era. Accordingly, our partisan belief is
that much remains to be clarified regarding the foundation and optimal control
of Ahlfors circle maps. The game of sharp estimation may look narrow-minded
"Absch\"atzungsmathematik" alike, yet the philosophical outcome is as usual to
contemplate how conformal and algebraic geometry are fighting together for the
soul of Riemann surfaces. A second part explores the connection with Hilbert's
16th as envisioned by Rohlin 1978.Comment: 675 pages, 199 figures; extended version of the former text (v.1) by
including now Rohlin's theory (v.2