821 research outputs found

    Inference of Temporally Varying Bayesian Networks

    Get PDF
    When analysing gene expression time series data an often overlooked but crucial aspect of the model is that the regulatory network structure may change over time. Whilst some approaches have addressed this problem previously in the literature, many are not well suited to the sequential nature of the data. Here we present a method that allows us to infer regulatory network structures that may vary between time points, utilising a set of hidden states that describe the network structure at a given time point. To model the distribution of the hidden states we have applied the Hierarchical Dirichlet Process Hideen Markov Model, a nonparametric extension of the traditional Hidden Markov Model, that does not require us to fix the number of hidden states in advance. We apply our method to exisiting microarray expression data as well as demonstrating is efficacy on simulated test data

    Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes

    Full text link
    Causal inference approaches in systems genetics exploit quantitative trait loci (QTL) genotypes to infer causal relationships among phenotypes. The genetic architecture of each phenotype may be complex, and poorly estimated genetic architectures may compromise the inference of causal relationships among phenotypes. Existing methods assume QTLs are known or inferred without regard to the phenotype network structure. In this paper we develop a QTL-driven phenotype network method (QTLnet) to jointly infer a causal phenotype network and associated genetic architecture for sets of correlated phenotypes. Randomization of alleles during meiosis and the unidirectional influence of genotype on phenotype allow the inference of QTLs causal to phenotypes. Causal relationships among phenotypes can be inferred using these QTL nodes, enabling us to distinguish among phenotype networks that would otherwise be distribution equivalent. We jointly model phenotypes and QTLs using homogeneous conditional Gaussian regression models, and we derive a graphical criterion for distribution equivalence. We validate the QTLnet approach in a simulation study. Finally, we illustrate with simulated data and a real example how QTLnet can be used to infer both direct and indirect effects of QTLs and phenotypes that co-map to a genomic region.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS288 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    MCMC inference for Markov Jump Processes via the Linear Noise Approximation

    Full text link
    Bayesian analysis for Markov jump processes is a non-trivial and challenging problem. Although exact inference is theoretically possible, it is computationally demanding thus its applicability is limited to a small class of problems. In this paper we describe the application of Riemann manifold MCMC methods using an approximation to the likelihood of the Markov jump process which is valid when the system modelled is near its thermodynamic limit. The proposed approach is both statistically and computationally efficient while the convergence rate and mixing of the chains allows for fast MCMC inference. The methodology is evaluated using numerical simulations on two problems from chemical kinetics and one from systems biology

    Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks

    Get PDF
    Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. Availability: The methods outlined in this article have been implemented in Matlab and are available on request

    Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes

    Get PDF
    <b>Method:</b> Dynamic Bayesian networks (DBNs) have been applied widely to reconstruct the structure of regulatory processes from time series data, and they have established themselves as a standard modelling tool in computational systems biology. The conventional approach is based on the assumption of a homogeneous Markov chain, and many recent research efforts have focused on relaxing this restriction. An approach that enjoys particular popularity is based on a combination of a DBN with a multiple changepoint process, and the application of a Bayesian inference scheme via reversible jump Markov chain Monte Carlo (RJMCMC). In the present article, we expand this approach in two ways. First, we show that a dynamic programming scheme allows the changepoints to be sampled from the correct conditional distribution, which results in improved convergence over RJMCMC. Second, we introduce a novel Bayesian clustering and information sharing scheme among nodes, which provides a mechanism for automatic model complexity tuning. <b>Results:</b> We evaluate the dynamic programming scheme on expression time series for Arabidopsis thaliana genes involved in circadian regulation. In a simulation study we demonstrate that the regularization scheme improves the network reconstruction accuracy over that obtained with recently proposed inhomogeneous DBNs. For gene expression profiles from a synthetically designed Saccharomyces cerevisiae strain under switching carbon metabolism we show that the combination of both: dynamic programming and regularization yields an inference procedure that outperforms two alternative established network reconstruction methods from the biology literature

    Bayesian Inference for Duplication-Mutation with Complementarity Network Models

    Get PDF
    We observe an undirected graph GG without multiple edges and self-loops, which is to represent a protein-protein interaction (PPI) network. We assume that GG evolved under the duplication-mutation with complementarity (DMC) model from a seed graph, G0G_0, and we also observe the binary forest Γ\Gamma that represents the duplication history of GG. A posterior density for the DMC model parameters is established, and we outline a sampling strategy by which one can perform Bayesian inference; that sampling strategy employs a particle marginal Metropolis-Hastings (PMMH) algorithm. We test our methodology on numerical examples to demonstrate a high accuracy and precision in the inference of the DMC model's mutation and homodimerization parameters

    A Scale-Free Structure Prior for Graphical Models with Applications in Functional Genomics

    Get PDF
    The problem of reconstructing large-scale, gene regulatory networks from gene expression data has garnered considerable attention in bioinformatics over the past decade with the graphical modeling paradigm having emerged as a popular framework for inference. Analysis in a full Bayesian setting is contingent upon the assignment of a so-called structure prior—a probability distribution on networks, encoding a priori biological knowledge either in the form of supplemental data or high-level topological features. A key topological consideration is that a wide range of cellular networks are approximately scale-free, meaning that the fraction, , of nodes in a network with degree is roughly described by a power-law with exponent between and . The standard practice, however, is to utilize a random structure prior, which favors networks with binomially distributed degree distributions. In this paper, we introduce a scale-free structure prior for graphical models based on the formula for the probability of a network under a simple scale-free network model. Unlike the random structure prior, its scale-free counterpart requires a node labeling as a parameter. In order to use this prior for large-scale network inference, we design a novel Metropolis-Hastings sampler for graphical models that includes a node labeling as a state space variable. In a simulation study, we demonstrate that the scale-free structure prior outperforms the random structure prior at recovering scale-free networks while at the same time retains the ability to recover random networks. We then estimate a gene association network from gene expression data taken from a breast cancer tumor study, showing that scale-free structure prior recovers hubs, including the previously unknown hub SLC39A6, which is a zinc transporter that has been implicated with the spread of breast cancer to the lymph nodes. Our analysis of the breast cancer expression data underscores the value of the scale-free structure prior as an instrument to aid in the identification of candidate hub genes with the potential to direct the hypotheses of molecular biologists, and thus drive future experiments

    Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration

    Get PDF
    Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a thermodynamic integration scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods

    Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art

    Full text link
    Stochasticity is a key characteristic of intracellular processes such as gene regulation and chemical signalling. Therefore, characterising stochastic effects in biochemical systems is essential to understand the complex dynamics of living things. Mathematical idealisations of biochemically reacting systems must be able to capture stochastic phenomena. While robust theory exists to describe such stochastic models, the computational challenges in exploring these models can be a significant burden in practice since realistic models are analytically intractable. Determining the expected behaviour and variability of a stochastic biochemical reaction network requires many probabilistic simulations of its evolution. Using a biochemical reaction network model to assist in the interpretation of time course data from a biological experiment is an even greater challenge due to the intractability of the likelihood function for determining observation probabilities. These computational challenges have been subjects of active research for over four decades. In this review, we present an accessible discussion of the major historical developments and state-of-the-art computational techniques relevant to simulation and inference problems for stochastic biochemical reaction network models. Detailed algorithms for particularly important methods are described and complemented with MATLAB implementations. As a result, this review provides a practical and accessible introduction to computational methods for stochastic models within the life sciences community
    corecore