90 research outputs found

    Approximate inference methods in probabilistic machine learning and Bayesian statistics

    Get PDF
    This thesis develops new methods for efficient approximate inference in probabilistic models. Such models are routinely used in different fields, yet they remain computationally challenging as they involve high-dimensional integrals. We propose different approximate inference approaches addressing some challenges in probabilistic machine learning and Bayesian statistics. First, we present a Bayesian framework for genome-wide inference of DNA methylation levels and devise an efficient particle filtering and smoothing algorithm that can be used to identify differentially methylated regions between case and control groups. Second, we present a scalable inference approach for state space models by combining variational methods with sequential Monte Carlo sampling. The method is applied to self-exciting point process models that allow for flexible dynamics in the latent intensity function. Third, a new variational density motivated by copulas is developed. This new variational family can be beneficial compared with Gaussian approximations, as illustrated on examples with Bayesian neural networks. Lastly, we make some progress in a gradient-based adaptation of Hamiltonian Monte Carlo samplers by maximizing an approximation of the proposal entropy

    Parameter inference for stochastic single-cell dynamics from lineage tree data

    Get PDF
    Background With the advance of experimental techniques such as time-lapse fluorescence microscopy, the availability of single-cell trajectory data has vastly increased, and so has the demand for computational methods suitable for parameter inference with this type of data. Most of currently available methods treat single-cell trajectories independently, ignoring the mother-daughter relationships and the information provided by the population structure. However, this information is essential if a process of interest happens at cell division, or if it evolves slowly compared to the duration of the cell cycle. Results In this work, we propose a Bayesian framework for parameter inference on single-cell time-lapse data from lineage trees. Our method relies on a combination of Sequential Monte Carlo for approximating the parameter likelihood function and Markov Chain Monte Carlo for parameter exploration. We demonstrate our inference framework on two simple examples in which the lineage tree information is crucial: one in which the cell phenotype can only switch at cell division and another where the cell state fluctuates slowly over timescales that extend well beyond the cell-cycle duration. Conclusion There exist several examples of biological processes, such as stem cell fate decisions or epigenetically controlled phase variation in bacteria, where the cell ancestry is expected to contain important information about the underlying system dynamics. Parameter inference methods that discard this information are expected to perform poorly for such type of processes. Our method provides a simple and computationally efficient way to take into account single-cell lineage tree data for the purpose of parameter inference and serves as a starting point for the development of more sophisticated and powerful approaches in the future

    Probabilistic methods for high dimensional signal processing

    Get PDF
    This thesis investigates the use of probabilistic and Bayesian methods for analysing high dimensional signals. The work proceeds in three main parts sharing similar objectives. Throughout we focus on building data efficient inference mechanisms geared toward high dimensional signal processing. This is achieved by using probabilistic models on top of informative data representation operators. We also improve on the fitting objective to make it better suited to our requirements. Variational Inference We introduce a variational approximation framework using direct optimisation of what is known as the scale invariant Alpha-Beta divergence (sAB-divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the Rényi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimised directly by re-purposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems. Roof-Edge hidden Markov Random Field We propose a method for semi-local Hurst estimation by incorporating a Markov random field model to constrain a wavelet-based pointwise Hurst estimator. This results in an estimator which is able to exploit the spatial regularities of a piecewise parametric varying Hurst parameter. The pointwise estimates are jointly inferred along with the parametric form of the underlying Hurst function which characterises how the Hurst parameter varies deterministically over the spatial support of the data. Unlike recent Hurst regularisation methods, the proposed approach is flexible in that arbitrary parametric forms can be considered and is extensible in as much as the associated gradient descent algorithm can accommodate a broad class of distributional assumptions without any significant modifications. The potential benefits of the approach are illustrated with simulations of various first-order polynomial forms. Scattering Hidden Markov Tree We here combine the rich, over-complete signal representation afforded by the scattering transform together with a probabilistic graphical model which captures hierarchical dependencies between coefficients at different layers. The wavelet scattering network result in a high-dimensional representation which is translation invariant and stable to deformations whilst preserving informative content. Such properties are achieved by cascading wavelet transform convolutions with non-linear modulus and averaging operators. The network structure and its distributions are described using a Hidden Markov Tree. This yields a generative model for high dimensional inference and offers a means to perform various inference tasks such as prediction. Our proposed scattering convolutional hidden Markov tree displays promising results on classification tasks of complex images in the challenging case where the number of training examples is extremely small. We also use variational methods on the aforementioned model and leverage the objective sAB variational objective defined earlier to improve the quality of the approximation

    STATISTICAL CHALLENGES IN NEXT GENERATION POPULATION GENOMICS STUDY

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    The Neutrophil's Eye-View: Inference and Visualisation of the Chemoattractant Field Driving Cell Chemotaxis In Vivo

    Get PDF
    As we begin to understand the signals that drive chemotaxis in vivo, it is becoming clear that there is a complex interplay of chemotactic factors, which changes over time as the inflammatory response evolves. New animal models such as transgenic lines of zebrafish, which are near transparent and where the neutrophils express a green fluorescent protein, have the potential to greatly increase our understanding of the chemotactic process under conditions of wounding and infection from video microscopy data. Measurement of the chemoattractants over space (and their evolution over time) is a key objective for understanding the signals driving neutrophil chemotaxis. However, it is not possible to measure and visualise the most important contributors to in vivo chemotaxis, and in fact the understanding of the main contributors at any particular time is incomplete. The key insight that we make in this investigation is that the neutrophils themselves are sensing the underlying field that is driving their action and we can use the observations of neutrophil movement to infer the hidden net chemoattractant field by use of a novel computational framework. We apply the methodology to multiple in vivo neutrophil recruitment data sets to demonstrate this new technique and find that the method provides consistent estimates of the chemoattractant field across the majority of experiments. The framework that we derive represents an important new methodology for cell biologists investigating the signalling processes driving cell chemotaxis, which we label the neutrophils eye-view of the chemoattractant field

    Probabilistic Models for Joint Segmentation, Detection and Tracking

    Get PDF
    Migrace buněk a buněčných částic hraje důležitou roli ve fungování živých organismů. Systematický výzkum buněčné migrace byl umožněn v posledních dvaceti letech rychlým rozvojem neinvazivních zobrazovacích technik a digitálních snímačů. Moderní zobrazovací systémy dovolují studovat chování buněčných populací složených z mnoha ticíců buněk. Manuální analýza takového množství dat by byla velice zdlouhavá, protože některé experimenty vyžadují analyzovat tvar, rychlost a další charakteristiky jednotlivých buněk. Z tohoto důvodu je ve vědecké komunitě velká poptávka po automatických metodách.Migration of cells and subcellular particles plays a crucial role in many processes in living organisms. Despite its importance a systematic research of cell motility has only been possible in last two decades due to rapid development of non-invasive imaging techniques and digital cameras. Modern imaging systems allow to study large populations with thousands of cells. Manual analysis of the acquired data is infeasible, because in order to gain insight into underlying biochemical processes it is sometimes necessary to determine shape, velocity and other characteristics of individual cells. Thus there is a high demand for automatic methods

    A statistical modeling framework for analyzing tree-indexed data: Application to plant development on microscopic and macroscopic scales

    Get PDF
    We address statistical models for tree-indexed data.In Virtual Plants team, the host team for this thesis, applications of interest focus on plant development and its modulation by environmental and genetic factors.We thus focus on plant developmental applications both at a microscopic level with the study of the cell lineage in the biological tissue responsible for the plant growth, and at a macroscopic level with the mechanism of branch production.Far fewer models are available for tree-indexed data than for path-indexed data.This thesis therefore aims to propose a statistical modeling framework for studying patterns in tree-indexed data.To this end, two different classes of statistical models, Markov and change-point models, are investigatedNous nous intéressons à des modèles statistiques pour données indexées par des arborescences. Dans le contexte de l'équipe Virtual Plants, les applications portent sur le développement de la plante et sa modulation par des facteurs génétiques et environnementaux. Les modèles statistiques pour données indexées par des arborescences sont beaucoup moins développés que ceux pour séquences ou séries temporelles. Cette thèse vise à proposer un cadre de modélisation statistique pour l'identification de patterns dans des données indexées par des arborescences. Deux classes de modèles statistiques, les modèles de Markov et leur extension aux modèles de Markov cachés et les modèles de détection de ruptures multiples, sont étudiés. Nous proposons notamment de nouvelles méthodes dinférence de la structure dindépendance conditionnelle entre nuds parent et enfants dans les modèles de Markov reposant sur des algorithmes de sélection de graphes dans des modèles graphiques probabilistes. Les modèles étudiés sont appliqués dune part à des arborescences de lignage cellulaire à léchelle microscopique et dautre part à des systèmes ramifiés à léchelle macroscopique

    Bayesian nonparametric models of genetic variation

    Get PDF
    We will develop three new Bayesian nonparametric models for genetic variation. These models are all dynamic-clustering approximations of the ancestral recombination graph (or ARG), a structure that fully describes the genetic history of a population. Due to its complexity, efficient inference for the ARG is not possible. However, different aspects of the ARG can be captured by the approximations discussed in our work. The ARG can be described by a tree valued HMM where the trees vary along the genetic sequence. Many modern models of genetic variation proceed by approximating these trees with (often finite) clusterings. We will consider Bayesian nonparametric priors for the clustering, thereby providing nonparametric generalizations of these models and avoiding problems with model selection and label switching. Further, we will compare the performance of these models on a wide selection of inference problems in genetics such as phasing, imputation, genome wide association and admixture or bottleneck discovery. These experiments should provide a common testing ground on which the different approximations inherent in modern genetic models can be compared. The results of these experiments should shed light on the nature of the approximations and guide future application of these models
    corecore