83 research outputs found
Parallel algorithms for lattice QCD
SIGLEAvailable from British Library Document Supply Centre- DSC:D81971 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Visual scene recognition with biologically relevant generative models
This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem
Solution strategies for nonlinear conservation laws
Nonlinear conservation laws form the basis for models for a wide range of physical phenomena. Finding an optimal strategy for solving these problems can be challenging, and a good strategy for one problem may fail spectacularly for others. As different problems have different challenging features, exploiting knowledge about the problem structure is a key factor in achieving an efficient solution strategy. Most strategies found in literature for solving nonlinear problems involve a linearization step, usually using Newton's method, which replaces the original nonlinear problem by an iteration process consisting of a series of linear problems. A large effort is then spent on finding a good strategy for solving these linear problems. This involves choosing suitable preconditioners and linear solvers. This approach is in many cases a good choice and a multitude of different methods have been developed. However, the linearization step to some degree involves a loss of information about the original problem. This is not necessarily critical, but in many cases the structure of the nonlinear problem can be exploited to a larger extent than what is possible when working solely on the linearized problem. This may involve knowledge about dominating physical processes and specifically on whether a process is near equilibrium. By using nonlinear preconditioning techniques developed in recent years, certain attractive features such as automatic localization of computations to parts of the problem domain with the highest degree of nonlinearities arise. In the present work, these methods are further refined to obtain a framework for nonlinear preconditioning that also takes into account equilibrium information. This framework is developed mainly in the context of porous media, but in a general manner, allowing for application to a wide range of problems. A scalability study shows that the method is scalable for challenging two-phase flow problems. It is also demonstrated for nonlinear elasticity problems. Some models arising from nonlinear conservation laws are best solved using completely different strategies than the approach outlined above. One such example can be found in the field of surface gravity waves. For special types of nonlinear waves, such as solitary waves and undular bores, the well-known Korteweg-de Vries (KdV) equation has been shown to be a suitable model. This equation has many interesting properties not typical of nonlinear equations which may be exploited in the solver, and strategies usually reserved to linear problems may be applied. In this work includes a comparative study of two discretization methods with highly different properties for this equation
Recommended from our members
Modernizing Markov Chains Monte Carlo for Scientific and Bayesian Modeling
The advent of probabilistic programming languages has galvanized scientists to write increasingly diverse models to analyze data. Probabilistic models use a joint distribution over observed and latent variables to describe at once elaborate scientific theories, non-trivial measurement procedures, information from previous studies, and more. To effectively deploy these models in a data analysis, we need inference procedures which are reliable, flexible, and fast. In a Bayesian analysis, inference boils down to estimating the expectation values and quantiles of the unnormalized posterior distribution. This estimation problem also arises in the study of non-Bayesian probabilistic models, a prominent example being the Ising model of Statistical Physics.
Markov chains Monte Carlo (MCMC) algorithms provide a general-purpose sampling method which can be used to construct sample estimators of moments and quantiles. Despite MCMCâs compelling theory and empirical success, many models continue to frustrate MCMC, as well as other inference strategies, effectively limiting our ability to use these models in a data analysis. These challenges motivate new developments in MCMC. The term âmodernizeâ in the title refers to the deployment of methods which have revolutionized Computational Statistics and Machine Learning in the past decade, including: (i) hardware accelerators to support massive parallelization, (ii) approximate inference based on tractable densities, (iii) high-performance automatic differentiation and (iv) continuous relaxations of discrete systems.
The growing availability of hardware accelerators such as GPUs has in the past years motivated a general MCMC strategy, whereby we run many chains in parallel with a short sampling phase, rather than a few chains with a long sampling phase. Unfortunately existing convergence diagnostics are not designed for the âmany short chainsâ regime. This is notably the case of the popular R statistics which claims convergence only if the effective sample size per chain is large. We present the nested R, denoted nR, a generalization of R which does not conflate short chains and poor mixing, and offers a useful diagnostic provided we run enough chains and meet certain initialization conditions. Combined with nR the short chain regime presents us with the opportunity to identify optimal lengths for the warmup and sampling phases, as well as the optimal number of chains; tuning parameters of MCMC which are otherwise chosen using heuristics or trial-and-error.
We next focus on semi-specialized algorithms for latent Gaussian models, arguably the most widely used of class of hierarchical models. It is well understood that MCMC often struggles with the geometry of the posterior distribution generated by these models. Using a Laplace approximation, we marginalize out the latent Gaussian variables and then integrate the remaining parameters with Hamiltonian Monte Carlo (HMC), a gradient-based MCMC. This approach combines MCMC and a distributional approximation, and offers a useful alternative to pure MCMC or pure approximation methods such as Variational Inference. We compare the three paradigms across a range of general linear models, which admit a sophisticated prior, i.e. a Gaussian process and a Horseshoe prior. To implement our scheme efficiently, we derive a novel automatic differentiation method called the adjoint-differentiated Laplace approximation. This differentiation algorithm propagates the minimal information needed to construct the gradient of the approximate marginal likelihood, and yields a scalable differentiation method that is orders of magnitude faster than state of the art differentiation for high-dimensional hyperparameters. We next discuss the application of our algorithm to models with an unconventional likelihood, going beyond the classical setting of general linear models. This necessitates a non-trivial generalization of the adjoint-differentiated Laplace approximation, which we implement using higher-order adjoint methods. The generalization works out to be both more general and more efficient. We apply the resulting method to an unconventional latent Gaussian model, identifying promising features and highlighting persistent challenges.
The final chapter of this dissertation focuses on a specific but rich problem: the Ising model of Statistical Physics, and its generalization as the Potts and Spin Glass models. These models are challenging because they are discrete, precluding the immediate use of gradient-based algorithms, and exhibit multiple modes, notably at cold temperatures. We propose a new class of MCMC algorithms to draw samples from Potts models by augmenting the target space with a carefully constructed auxiliary Gaussian variable. In contrast to existing methods of a similar flavor, our algorithm can take advantage of the low-rank structure of the coupling matrix and scales linearly with the number of states in a Potts model. The method is applied to a broad range of coupling and temperature regimes and compared to several sampling methods, allowing us to paint a nuanced algorithmic landscape
Sparse inverse covariance estimation in Gaussian graphical models
One of the fundamental tasks in science is to find explainable relationships between
observed phenomena. Recent work has addressed this problem by attempting to learn
the structure of graphical models - especially Gaussian models - by the imposition of
sparsity constraints.
The graphical lasso is a popular method for learning the structure of a Gaussian
model. It uses regularisation to impose sparsity. In real-world problems, there may be
latent variables that confound the relationships between the observed variables. Ignoring
these latents, and imposing sparsity in the space of the visibles, may lead to the
pruning of important structural relationships. We address this problem by introducing
an expectation maximisation (EM) method for learning a Gaussian model that is
sparse in the joint space of visible and latent variables. By extending this to a conditional
mixture, we introduce multiple structures, and allow side information to be used
to predict which structure is most appropriate for each data point. Finally, we handle
non-Gaussian data by extending each sparse latent Gaussian to a Gaussian copula. We
train these models on a financial data set; we find the structures to be interpretable, and
the new models to perform better than their existing competitors.
A potential problem with the mixture model is that it does not require the structure
to persist in time, whereas this may be expected in practice. So we construct an input-output
HMM with sparse Gaussian emissions. But the main result is that, provided the
side information is rich enough, the temporal component of the model provides little
benefit, and reduces efficiency considerably.
The GWishart distribution may be used as the basis for a Bayesian approach to
learning a sparse Gaussian. However, sampling from this distribution often limits the
efficiency of inference in these models. We make a small change to the state-of-the-art
block Gibbs sampler to improve its efficiency. We then introduce a Hamiltonian
Monte Carlo sampler that is much more efficient than block Gibbs, especially in high
dimensions. We use these samplers to compare a Bayesian approach to learning a
sparse Gaussian with the (non-Bayesian) graphical lasso. We find that, even when
limited to the same time budget, the Bayesian method can perform better.
In summary, this thesis introduces practically useful advances in structure learning
for Gaussian graphical models and their extensions. The contributions include the addition
of latent variables, a non-Gaussian extension, (temporal) conditional mixtures,
and methods for efficient inference in a Bayesian formulation
NEW COMPUTATIONAL METHODS FOR OPTIMAL CONTROL OF PARTIAL DIFFERENTIAL EQUATIONS
Partial differential equations are the chief means of providing mathematical models in science, engineering and other fields. Optimal control of partial differential equations (PDEs) has tremendous applications in engineering and science, such as shape optimization, image processing, fluid dynamics, and chemical processes. In this thesis, we develop and analyze several efficient numerical methods for the optimal control problems governed by elliptic PDE, parabolic PDE, and wave PDE, respectively. The thesis consists of six chapters. In Chapter 1, we briefly introduce a few motivating applications and summarize some theoretical and computational foundations of our following developed approaches. In Chapter 2, we establish a new multigrid algorithm to accelerate the semi-smooth Newton method that is applied to the first-order necessary optimality system arising from semi-linear control-constrained elliptic optimal control problems. Under suitable assumptions, the discretized Jacobian matrix is proved to have a uniformly bounded inverse with respect to mesh size. Different from current available approaches, a new strategy that leads to a robust multigrid solver is employed to define the coarse grid operator. Numerical simulations are provided to illustrate the efficiency of the proposed method, which shows to be computationally more efficient than the popular full approximation storage (FAS) multigrid method. In particular, our proposed approach achieves a mesh-independent convergence and its performance is highly robust with respect to the regularization parameter. In Chaper 3, we present a new second-order leapfrog finite difference scheme in time for solving the first-order necessary optimality system of the linear parabolic optimal control problems. The new leapfrog scheme is shown to be unconditionally stable and it provides a second-order accuracy, while the classical leapfrog scheme usually is well-known to be unstable. A mathematical proof for the convergence of the proposed scheme is provided under a suitable norm. Moreover, the proposed leapfrog scheme gives a favorable structure that leads to an effective implementation of a fast solver under the multigrid framework. Numerical examples show that the proposed scheme significantly outperforms the widely used second-order backward time differentiation approach, and the resultant fast solver demonstrates a mesh-independent convergence as well as a linear time complexity. In Chapter 4, we develop a new semi-smooth Newton multigrid algorithm for solving the discretized first-order necessary optimality system that characterizes the optimal solution of semi-linear parabolic PDE optimal control problems with control constraints. A new leapfrog discretization scheme in time associated with the standard five-point stencil in space is established to achieve a second-order accuracy. The convergence (or unconditional stability) of the proposed scheme is proved when time-periodic solutions are considered. Moreover, the derived well-structured discretized Jacobian matrices greatly facilitate the development of an effective smoother in our multigrid algorithm. Numerical simulations are provided to illustrate the effectiveness of the proposed method, which validates the second-order accuracy in solution approximations as well as the optimal linear complexity of computing time. In Chapter 5, we offer a new implicit finite difference scheme in time for solving the first-order necessary optimality system arising in optimal control of wave equations. With a five-point central finite difference scheme in space, the full discretization is proved to be unconditionally convergent with a second-order accuracy, which is not restricted by the classical Courant-Friedrichs-Lewy (CFL) stability condition on the spatial and temporal step sizes. Moreover, based on its advantageous developed structure, an efficient preconditioned Krylov subspace method is provided and analyzed for solving the discretized sparse linear system. Numerical examples are presented to confirm our theoretical conclusions and demonstrate the promising performance of proposed preconditioned iterative solver. Finally, brief summaries and future research perspectives are given in Chapter 6
- âŠ