18 research outputs found

    Distributed Bayesian networks reconstruction on the whole genome scale

    Get PDF
    Background Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. Results In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. Conclusions We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets

    Applying dynamic Bayesian networks to perturbed gene expression data

    Get PDF
    BACKGROUND: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. RESULTS: We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. CONCLUSION: We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough

    Dynamic CRM occupancy reflects a temporal map of developmental progression

    No full text
    Development is driven by tightly coordinated spatio-temporal patterns of gene expression, which are initiated through the action of transcription factors (TFs) binding to cis-regulatory modules (CRMs). Although many studies have investigated how spatial patterns arise, precise temporal control of gene expression is less well understood. Here, we show that dynamic changes in the timing of CRM occupancy is a prevalent feature common to all TFs examined in a developmental ChIP time course to date. CRMs exhibit complex binding patterns that cannot be explained by the sequence motifs or expression of the TFs themselves. The temporal changes in TF binding are highly correlated with dynamic patterns of target gene expression, which in turn reflect transitions in cellular function during different stages of development. Thus, it is not only the timing of a TF's expression, but also its temporal occupancy in refined time windows, which determines temporal gene expression. Systematic measurement of dynamic CRM occupancy may therefore serve as a powerful method to decode dynamic changes in gene expression driving developmental progression
    corecore