63 research outputs found

    Approximate inference methods in probabilistic machine learning and Bayesian statistics

    Get PDF
    This thesis develops new methods for efficient approximate inference in probabilistic models. Such models are routinely used in different fields, yet they remain computationally challenging as they involve high-dimensional integrals. We propose different approximate inference approaches addressing some challenges in probabilistic machine learning and Bayesian statistics. First, we present a Bayesian framework for genome-wide inference of DNA methylation levels and devise an efficient particle filtering and smoothing algorithm that can be used to identify differentially methylated regions between case and control groups. Second, we present a scalable inference approach for state space models by combining variational methods with sequential Monte Carlo sampling. The method is applied to self-exciting point process models that allow for flexible dynamics in the latent intensity function. Third, a new variational density motivated by copulas is developed. This new variational family can be beneficial compared with Gaussian approximations, as illustrated on examples with Bayesian neural networks. Lastly, we make some progress in a gradient-based adaptation of Hamiltonian Monte Carlo samplers by maximizing an approximation of the proposal entropy

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Machine learning approach to reconstructing signalling pathways and interaction networks in biology

    Get PDF
    In this doctoral thesis, I present my research into applying machine learning techniques for reconstructing species interaction networks in ecology, reconstructing molecular signalling pathways and gene regulatory networks in systems biology, and inferring parameters in ordinary differential equation (ODE) models of signalling pathways. Together, the methods I have developed for these applications demonstrate the usefulness of machine learning for reconstructing networks and inferring network parameters from data. The thesis consists of three parts. The first part is a detailed comparison of applying static Bayesian networks, relevance vector machines, and linear regression with L1 regularisation (LASSO) to the problem of reconstructing species interaction networks from species absence/presence data in ecology (Faisal et al., 2010). I describe how I generated data from a stochastic population model to test the different methods and how the simulation study led us to introduce spatial autocorrelation as an important covariate. I also show how we used the results of the simulation study to apply the methods to presence/absence data of bird species from the European Bird Atlas. The second part of the thesis describes a time-varying, non-homogeneous dynamic Bayesian network model for reconstructing signalling pathways and gene regulatory networks, based on L`ebre et al. (2010). I show how my work has extended this model to incorporate different types of hierarchical Bayesian information sharing priors and different coupling strategies among nodes in the network. The introduction of these priors reduces the inference uncertainty by putting a penalty on the number of structure changes among network segments separated by inferred changepoints (Dondelinger et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic and real data, I demonstrate that using information sharing priors leads to a better reconstruction accuracy of the underlying gene regulatory networks, and I compare the different priors and coupling strategies. I show the results of applying the model to gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae. In each case, the underlying network is time-varying; for Drosophila melanogaster, as a consequence of measuring gene expression during different developmental stages; for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian clock genes under different conditions; and for the synthetic biology dataset, as a consequence of changing the growth environment. I show that in addition to inferring sensible network structures, the model also successfully predicts the locations of changepoints. The third and final part of this thesis is concerned with parameter inference in ODE models of biological systems. This problem is of interest to systems biology researchers, as kinetic reaction parameters can often not be measured, or can only be estimated imprecisely from experimental data. Due to the cost of numerically solving the ODE system after each parameter adaptation, this is a computationally challenging problem. Gradient matching techniques circumvent this problem by directly fitting the derivatives of the ODE to the slope of an interpolant. I present an inference procedure for a model using nonparametric Bayesian statistics with Gaussian processes, based on Calderhead et al. (2008). I show that the new inference procedure improves on the original formulation in Calderhead et al. (2008) and I present the result of applying it to ODE models of predator-prey interactions, a circadian clock gene, a signal transduction pathway, and the JAK/STAT pathway

    Advances in approximate Bayesian computation and trans-dimensional sampling methodology

    Full text link
    Bayesian statistical models continue to grow in complexity, driven in part by a few key factors: the massive computational resources now available to statisticians; the substantial gains made in sampling methodology and algorithms such as Markov chain Monte Carlo (MCMC), trans-dimensional MCMC (TDMCMC), sequential Monte Carlo (SMC), adaptive algorithms and stochastic approximation methods and approximate Bayesian computation (ABC); and development of more realistic models for real world phenomena as demonstrated in this thesis for financial models and telecommunications engineering. Sophisticated statistical models are increasingly proposed for practical solutions to real world problems in order to better capture salient features of increasingly more complex data. With sophistication comes a parallel requirement for more advanced and automated statistical computational methodologies. The key focus of this thesis revolves around innovation related to the following three significant Bayesian research questions. 1. How can one develop practically useful Bayesian models and corresponding computationally efficient sampling methodology, when the likelihood model is intractable? 2. How can one develop methodology in order to automate Markov chain Monte Carlo sampling approaches to efficiently explore the support of a posterior distribution, defined across multiple Bayesian statistical models? 3. How can these sophisticated Bayesian modelling frameworks and sampling methodologies be utilized to solve practically relevant and important problems in the research fields of financial risk modeling and telecommunications engineering ? This thesis is split into three bodies of work represented in three parts. Each part contains journal papers with novel statistical model and sampling methodological development. The coherent link between each part involves the novel sampling methodologies developed in Part I and utilized in Part II and Part III. Papers contained in each part make progress at addressing the core research questions posed. Part I of this thesis presents generally applicable key statistical sampling methodologies that will be utilized and extended in the subsequent two parts. In particular it presents novel developments in statistical methodology pertaining to likelihood-free or ABC and TDMCMC methodology. The TDMCMC methodology focuses on several aspects of automation in the between model proposal construction, including approximation of the optimal between model proposal kernel via a conditional path sampling density estimator. Then this methodology is explored for several novel Bayesian model selection applications including cointegrated vector autoregressions (CVAR) models and mixture models in which there is an unknown number of mixture components. The second area relates to development of ABC methodology with particular focus on SMC Samplers methodology in an ABC context via Partial Rejection Control (PRC). In addition to novel algorithmic development, key theoretical properties are also studied for the classes of algorithms developed. Then this methodology is developed for a highly challenging practically significant application relating to multivariate Bayesian α\alpha-stable models. Then Part II focuses on novel statistical model development in the areas of financial risk and non-life insurance claims reserving. In each of the papers in this part the focus is on two aspects: foremost the development of novel statistical models to improve the modeling of risk and insurance; and then the associated problem of how to fit and sample from such statistical models efficiently. In particular novel statistical models are developed for Operational Risk (OpRisk) under a Loss Distributional Approach (LDA) and for claims reserving in Actuarial non-life insurance modelling. In each case the models developed include an additional level of complexity which adds flexibility to the model in order to better capture salient features observed in real data. The consequence of the additional complexity comes at the cost that standard fitting and sampling methodologies are generally not applicable, as a result one is required to develop and apply the methodology from Part I. Part III focuses on novel statistical model development in the area of statistical signal processing for wireless communications engineering. Statistical models will be developed or extended for two general classes of wireless communications problem: the first relates to detection of transmitted symbols and joint channel estimation in Multiple Input Multiple Output (MIMO) systems coupled with Orthogonal Frequency Division Multiplexing (OFDM); the second relates to co-operative wireless communications relay systems in which the key focus is on detection of transmitted symbols. Both these areas will require advanced sampling methodology developed in Part I to find solutions to these real world engineering problems

    Mathematical modelling of the floral transition — with a Bayesian flourish —

    Get PDF
    Flowering plants are abundant on Earth. In the model dicot plant species, Arabidopsis thaliana, multiple endogenous and exogenous signals converge to initiate a change from vegetative to reproductive growth in optimal environmental conditions. Much genetic and experimental research has gone into elucidating the biological mechanisms controlling the floral transition. However there has been little mathematical modelling of this process. The aim of this thesis was to gain an understanding of the essential features and dynamic properties underlying this developmental phase change from a systems and computational biology perspective. Combining mathematical modelling with experimental results a core regulatory network was defined with multiple feedback loops. Simplified models inevitably miss finer details of the biological system yet they provide a route to understanding the overall system behaviour.This reductionist path allowed a tractable genetic regulatory network to be investigated without large numbers of parameters. Not overfitting to data and parameter inference are two current challenges in systems biology. Treating all unknowns as a probability within the setting of Bayes’ theorem as a statistical framework allows for a solution to both of these issues. This thesis investigates the use of a contemporary Bayesian inference algorithm, nested sampling, for inference problems typically found in systems biology where the data are few and noisy. Nested sampling simultaneously calculates the key term for model comparison and also produces parameter inferences allowing uncertainty in models and predictions to be robustly quantified. Network models are developed that can accurately reproduce experimental leaf number data, show important properties of the floral transition such as the ability to filter environmental noise and provide a clue on spatial patterning of an Arabidopsis shoot apex. Incorporating network knowledge into a plant breeding program is an exciting goal for future developments addressing global food security

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Bayesian inversion and model selection of heterogeneities in geostatistical subsurface modeling

    Get PDF
    • …
    corecore