71 research outputs found

    BNFinder: exact and efficient method for learning Bayesian networks

    Get PDF
    Motivation: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers

    Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult.</p> <p>Results</p> <p>We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms.</p> <p>Conclusion</p> <p>We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.</p

    RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes

    Get PDF
    Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software

    Two new zinc(II) acetates with 3- and 4-aminopyridine

    Get PDF
    The synthesis and characterization of two new zinc(II) coordination compounds with 3- and 4-aminopyridine are reported. They were obtained after adding a water solution of Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O or dissolving solid Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O in methanol solutions of 3- and 4-aminopyridine. The products were characterized structurally by single-crystal X-ray diffraction analysis. Colourless crystals of the compound synthesized by the reaction of Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O and 3-aminopyridine (3-apy), are built of trinuclear complex molecules with the formula [Zn3(O2CCH3)6(3apy)2(H2O)2][Zn_3(O_2CCH_3)_6(3- apy)_2(H_2O)_2](1). The molecules consists of two terminal ZnZn atoms, coordinated tetrahedrally, and one central ZnZn atom, coordinated octahedrally. Colourless crystals, obtained by the reaction of Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O with 4-aminopyridine (4-apy), consist of a mononuclear complex [Zn(O2CCH3)2(4apy)2][Zn(O_2CCH_3)_2(4-apy)_2](2). Hydrogen-bonding interactions in the crystal structures of both complexes are reported.Sintetizirali in karakterizirali smo novi cinkovi koordinacijski spojini s 3- in 4-aminopiridinom. Dobili smo ju z dodajanjem metanolne raztopine Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O v vodno raztopino 3-aminopiridina oziroma raztapljanjem Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O v metanolni raztopini 4-aminopiridina. Produkta sta bila okarakterizirana z rentgensko strukturno analizo monokristalov. Brezbarvni kristali, pridobljeni z reakcijo med Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H_2O in 3-aminopiridinom, so zgrajeni iz trijedrnih koordinacijskih molekul s kemijsko formulo [Zn3(O2CCH3)6(3apy)2(H2O)2][Zn_3(O_2CCH_3)_6(3-apy)_2(H_2O)_2](1). Molekula je sestavljena iz dveh terminalnih cinkovih ionov, ki sta tetraedično koordinirana, in enega centralnega iona, ki je oktaedrično koordiniran. Brezbarvni kristali, dobljeni z reakcijo med Zn(CH3COO)2Zn(CH_3COO)_2 · 2H2O2H2_O in 4-aminopiridinom, sestojijo iz enojedrnih koordinacijskih molekul s kemijsko formulo [Zn(O2CCH3)2(4apy)2][Zn(O_2CCH_3)_2(4-apy)_2](2). Poročamo tudi o vodikovih vezeh v kristalnih strukturah obeh spojin

    Applying dynamic Bayesian networks to perturbed gene expression data

    Get PDF
    BACKGROUND: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. RESULTS: We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. CONCLUSION: We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough

    Comparison between Suitable Priors for Additive Bayesian Networks

    Full text link
    Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student's t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's t-prior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure

    Listen to genes : dealing with microarray data in the frequency domain

    Get PDF
    Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes. Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail. Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers

    Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks

    Get PDF
    BACKGROUND. Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models. RESULTS. We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time. CONCLUSION. The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.Italian Ministry of University and Scientific Research; National Institutes of Health & National Human Genome Research Institute (HG003354-01A2); Collegio Ghislieri, Pavia Italy fellowshi
    corecore