Search CORE

63 research outputs found

Approximate inference methods in probabilistic machine learning and Bayesian statistics

Author: Hirt Marcel Andre
Publication venue: UCL (University College London)
Publication date: 28/07/2021
Field of study

This thesis develops new methods for efficient approximate inference in probabilistic models. Such models are routinely used in different fields, yet they remain computationally challenging as they involve high-dimensional integrals. We propose different approximate inference approaches addressing some challenges in probabilistic machine learning and Bayesian statistics. First, we present a Bayesian framework for genome-wide inference of DNA methylation levels and devise an efficient particle filtering and smoothing algorithm that can be used to identify differentially methylated regions between case and control groups. Second, we present a scalable inference approach for state space models by combining variational methods with sequential Monte Carlo sampling. The method is applied to self-exciting point process models that allow for flexible dynamics in the latent intensity function. Third, a new variational density motivated by copulas is developed. This new variational family can be beneficial compared with Gaussian approximations, as illustrated on examples with Bayesian neural networks. Lastly, we make some progress in a gradient-based adaptation of Hamiltonian Monte Carlo samplers by maximizing an approximation of the proposal entropy

UCL Discovery

Closed Likelihood Ratio Testing Procedures to Assess Similarity of Covariance Matrices

Author: Akaike H.
Anderson T. W.
Antonio Punzo
Bagnato L.
Bensmail H.
Biernacki C.
Boente G.
Bozdogan H.
Bozdogan H.
Bretz F.
Campbell N. A.
Cavanaugh J. E.
Celeux G.
Christensen R.
Emerson S.
Fisher R. A.
Flury B. N.
Flury B. N.
Flury B. N.
Flury B. N.
Francesca Greselin
Giancristofaro Arboretti R.
Greselin F.
Hallin M.
Hochberg Y.
Holm S.
Jolicoeur P.
Manly B. F. J.
Marcus R.
R Development Core Team
Rencher A. C.
Schmidt-Nielsen K.
Schwarz G.
Westfall P.
Westfall P. H.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

Author: Ferreira N.
Oliveira M.
Publication venue: CFE and CMStatistics networks
Publication date: 01/01/2015
Field of study

The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

Machine learning approach to reconstructing signalling pathways and interaction networks in biology

Author: Dondelinger Frank
Publication venue: The University of Edinburgh
Publication date: 02/07/2013
Field of study

In this doctoral thesis, I present my research into applying machine learning techniques for reconstructing species interaction networks in ecology, reconstructing molecular signalling pathways and gene regulatory networks in systems biology, and inferring parameters in ordinary differential equation (ODE) models of signalling pathways. Together, the methods I have developed for these applications demonstrate the usefulness of machine learning for reconstructing networks and inferring network parameters from data. The thesis consists of three parts. The first part is a detailed comparison of applying static Bayesian networks, relevance vector machines, and linear regression with L1 regularisation (LASSO) to the problem of reconstructing species interaction networks from species absence/presence data in ecology (Faisal et al., 2010). I describe how I generated data from a stochastic population model to test the different methods and how the simulation study led us to introduce spatial autocorrelation as an important covariate. I also show how we used the results of the simulation study to apply the methods to presence/absence data of bird species from the European Bird Atlas. The second part of the thesis describes a time-varying, non-homogeneous dynamic Bayesian network model for reconstructing signalling pathways and gene regulatory networks, based on L`ebre et al. (2010). I show how my work has extended this model to incorporate different types of hierarchical Bayesian information sharing priors and different coupling strategies among nodes in the network. The introduction of these priors reduces the inference uncertainty by putting a penalty on the number of structure changes among network segments separated by inferred changepoints (Dondelinger et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic and real data, I demonstrate that using information sharing priors leads to a better reconstruction accuracy of the underlying gene regulatory networks, and I compare the different priors and coupling strategies. I show the results of applying the model to gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae. In each case, the underlying network is time-varying; for Drosophila melanogaster, as a consequence of measuring gene expression during different developmental stages; for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian clock genes under different conditions; and for the synthetic biology dataset, as a consequence of changing the growth environment. I show that in addition to inferring sensible network structures, the model also successfully predicts the locations of changepoints. The third and final part of this thesis is concerned with parameter inference in ODE models of biological systems. This problem is of interest to systems biology researchers, as kinetic reaction parameters can often not be measured, or can only be estimated imprecisely from experimental data. Due to the cost of numerically solving the ODE system after each parameter adaptation, this is a computationally challenging problem. Gradient matching techniques circumvent this problem by directly fitting the derivatives of the ODE to the slope of an interpolant. I present an inference procedure for a model using nonparametric Bayesian statistics with Gaussian processes, based on Calderhead et al. (2008). I show that the new inference procedure improves on the original formulation in Calderhead et al. (2008) and I present the result of applying it to ODE models of predator-prey interactions, a circadian clock gene, a signal transduction pathway, and the JAK/STAT pathway

Edinburgh Research Archive

Recommended from our members

Scalable Tools for Information Extraction and Causal Modeling of Neural Data

Author: Nejatbakhshesfahani Mohammadamin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2022
Field of study

Systems neuroscience has entered in the past 20 years into an era that one might call "large scale systems neuroscience". From tuning curves and single neuron recordings there has been a conceptual shift towards a more holistic understanding of how the neural circuits work and as a result how their representations produce neural tunings. With the introduction of a plethora of datasets in various scales, modalities, animals, and systems; we as a community have witnessed invaluable insights that can be gained from the collective view of a neural circuit which was not possible with small scale experimentation. The concurrency of the advances in neural recordings such as the production of wide field imaging technologies and neuropixels with the developments in statistical machine learning and specifically deep learning has brought system neuroscience one step closer to data science. With this abundance of data, the need for developing computational models has become crucial. We need to make sense of the data, and thus we need to build models that are constrained up to the acceptable amount of biological detail and probe those models in search of neural mechanisms. This thesis consists of sections covering a wide range of ideas from computer vision, statistics, machine learning, and dynamical systems. But all of these ideas share a common purpose, which is to help automate neuroscientific experimentation process in different levels. In chapters 1, 2, and 3, I develop tools that automate the process of extracting useful information from raw neuroscience data in the model organism C. elegans. The goal of this is to avoid manual labor and pave the way for high throughput data collection aiming at better quantification of variability across the population of worms. Due to its high level of structural and functional stereotypy, and its relative simplicity, the nematode C. elegans has been an attractive model organism for systems and developmental research. With 383 neurons in males and 302 neurons in hermaphrodites, the positions and function of neurons is remarkably conserved across individuals. Furthermore, C. elegans remains the only organism for which a complete cellular, lineage, and anatomical map of the entire nervous system has been described for both sexes. Here, I describe the analysis pipeline that we developed for the recently proposed NeuroPAL technique in C. elegans. Our proposed pipeline consists of atlas building (chapter 1), registration, segmentation, neural tracking (chapter 2), and signal extraction (chapter 3). I emphasize that categorizing the analysis techniques as a pipeline consisting of the above steps is general and can be applied to virtually every single animal model and emerging imaging modality. I use the language of probabilistic generative modeling and graphical models to communicate the ideas in a rigorous form, therefore some familiarity with those concepts could help the reader navigate through the chapters of this thesis more easily. In chapters 4 and 5 I build models that aim to automate hypothesis testing and causal interrogation of neural circuits. The notion of functional connectivity (FC) has been instrumental in our understanding of how information propagates in a neural circuit. However, an important limitation is that current techniques do not dissociate between causal connections and purely functional connections with no mechanistic correspondence. I start chapter 4 by introducing causal inference as a unifying language for the following chapters. In chapter 4 I define the notion of interventional connectivity (IC) as a way to summarize the effect of stimulation in a neural circuit providing a more mechanistic description of the information flow. I then investigate which functional connectivity metrics are best predictive of IC in simulations and real data. Following this framework, I discuss how stimulations and interventions can be used to improve fitting and generalization properties of time series models. Building on the literature of model identification and active causal discovery I develop a switching time series model and a method for finding stimulation patterns that help the model to generalize to the vicinity of the observed neural trajectories. Finally in chapter 5 I develop a new FC metric that separates the transferred information from one variable to the other into unique and synergistic sources. In all projects, I have abstracted out concepts that are specific to the datasets at hand and developed the methods in the most general form. This makes the presented methods applicable to a broad range of datasets, potentially leading to new findings. In addition, all projects are accompanied with extensible and documented code packages, allowing theorists to repurpose the modules for novel applications and experimentalists to run analysis on their datasets efficiently and scalably. In summary my main contribution in this thesis are the following: 1) Building the first atlases of hermaphrodite and male C. elegans and developing a generic statistical framework for constructing atlases for a broad range of datasets. 2) Developing a semi-automated analysis pipeline for neural registration, segmentation, and tracking in C. elegans. 3) Extending the framework of non-negative matrix factorization to datasets with deformable motion and developing algorithms for joint tracking and signal demixing from videos of semi-immobilized C. elegans. 4) Defining the notion of interventional connectivity (IC) as a way to summarize the effect of stimulation in a neural circuit and investigating which functional connectivity metrics are best predictive of IC in simulations and real data. 5) Developing a switching time series model and a method for finding stimulation patterns that help the model to generalize to the vicinity of the observed neural trajectories. 6) Developing a new functional connectivity metric that separates the transferred information from one variable to the other into unique and synergistic sources. 7) Implementing extensible, well documented, open source code packages for each of the above contributions

Columbia University Academic Commons

Advances in approximate Bayesian computation and trans-dimensional sampling methodology

Author: Peters Gareth William
Publication venue: UNSW, Sydney
Publication date: 01/01/2010
Field of study

Bayesian statistical models continue to grow in complexity, driven in part by a few key factors: the massive computational resources now available to statisticians; the substantial gains made in sampling methodology and algorithms such as Markov chain Monte Carlo (MCMC), trans-dimensional MCMC (TDMCMC), sequential Monte Carlo (SMC), adaptive algorithms and stochastic approximation methods and approximate Bayesian computation (ABC); and development of more realistic models for real world phenomena as demonstrated in this thesis for financial models and telecommunications engineering. Sophisticated statistical models are increasingly proposed for practical solutions to real world problems in order to better capture salient features of increasingly more complex data. With sophistication comes a parallel requirement for more advanced and automated statistical computational methodologies. The key focus of this thesis revolves around innovation related to the following three significant Bayesian research questions. 1. How can one develop practically useful Bayesian models and corresponding computationally efficient sampling methodology, when the likelihood model is intractable? 2. How can one develop methodology in order to automate Markov chain Monte Carlo sampling approaches to efficiently explore the support of a posterior distribution, defined across multiple Bayesian statistical models? 3. How can these sophisticated Bayesian modelling frameworks and sampling methodologies be utilized to solve practically relevant and important problems in the research fields of financial risk modeling and telecommunications engineering ? This thesis is split into three bodies of work represented in three parts. Each part contains journal papers with novel statistical model and sampling methodological development. The coherent link between each part involves the novel sampling methodologies developed in Part I and utilized in Part II and Part III. Papers contained in each part make progress at addressing the core research questions posed. Part I of this thesis presents generally applicable key statistical sampling methodologies that will be utilized and extended in the subsequent two parts. In particular it presents novel developments in statistical methodology pertaining to likelihood-free or ABC and TDMCMC methodology. The TDMCMC methodology focuses on several aspects of automation in the between model proposal construction, including approximation of the optimal between model proposal kernel via a conditional path sampling density estimator. Then this methodology is explored for several novel Bayesian model selection applications including cointegrated vector autoregressions (CVAR) models and mixture models in which there is an unknown number of mixture components. The second area relates to development of ABC methodology with particular focus on SMC Samplers methodology in an ABC context via Partial Rejection Control (PRC). In addition to novel algorithmic development, key theoretical properties are also studied for the classes of algorithms developed. Then this methodology is developed for a highly challenging practically significant application relating to multivariate Bayesian

\alpha

-stable models. Then Part II focuses on novel statistical model development in the areas of financial risk and non-life insurance claims reserving. In each of the papers in this part the focus is on two aspects: foremost the development of novel statistical models to improve the modeling of risk and insurance; and then the associated problem of how to fit and sample from such statistical models efficiently. In particular novel statistical models are developed for Operational Risk (OpRisk) under a Loss Distributional Approach (LDA) and for claims reserving in Actuarial non-life insurance modelling. In each case the models developed include an additional level of complexity which adds flexibility to the model in order to better capture salient features observed in real data. The consequence of the additional complexity comes at the cost that standard fitting and sampling methodologies are generally not applicable, as a result one is required to develop and apply the methodology from Part I. Part III focuses on novel statistical model development in the area of statistical signal processing for wireless communications engineering. Statistical models will be developed or extended for two general classes of wireless communications problem: the first relates to detection of transmitted symbols and joint channel estimation in Multiple Input Multiple Output (MIMO) systems coupled with Orthogonal Frequency Division Multiplexing (OFDM); the second relates to co-operative wireless communications relay systems in which the key focus is on detection of transmitted symbols. Both these areas will require advanced sampling methodology developed in Part I to find solutions to these real world engineering problems

UNSWorks

Mathematical modelling of the floral transition — with a Bayesian flourish —

Author: Pullen Nicholas
Publication venue
Publication date: 01/09/2014
Field of study

Flowering plants are abundant on Earth. In the model dicot plant species, Arabidopsis thaliana, multiple endogenous and exogenous signals converge to initiate a change from vegetative to reproductive growth in optimal environmental conditions. Much genetic and experimental research has gone into elucidating the biological mechanisms controlling the floral transition. However there has been little mathematical modelling of this process. The aim of this thesis was to gain an understanding of the essential features and dynamic properties underlying this developmental phase change from a systems and computational biology perspective. Combining mathematical modelling with experimental results a core regulatory network was defined with multiple feedback loops. Simplified models inevitably miss finer details of the biological system yet they provide a route to understanding the overall system behaviour.This reductionist path allowed a tractable genetic regulatory network to be investigated without large numbers of parameters. Not overfitting to data and parameter inference are two current challenges in systems biology. Treating all unknowns as a probability within the setting of Bayes’ theorem as a statistical framework allows for a solution to both of these issues. This thesis investigates the use of a contemporary Bayesian inference algorithm, nested sampling, for inference problems typically found in systems biology where the data are few and noisy. Nested sampling simultaneously calculates the key term for model comparison and also produces parameter inferences allowing uncertainty in models and predictions to be robustly quantified. Network models are developed that can accurately reproduce experimental leaf number data, show important properties of the floral transition such as the ability to filter environmental noise and provide a clue on spatial patterning of an Arabidopsis shoot apex. Incorporating network knowledge into a plant breeding program is an exciting goal for future developments addressing global food security

University of East Anglia digital repository

Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

Author
Publication venue: AUAI Press
Publication date: 01/09/2018
Field of study

UCL Discovery

Bayesian inversion and model selection of heterogeneities in geostatistical subsurface modeling

Author: Reuschen Sebastian
Publication venue: Universität Stuttgart, Institut für Wasser- und Umweltsystemmodellierung
Publication date: 01/01/2021
Field of study

Hydraulic Engineering Repository