71 research outputs found
BNFinder: exact and efficient method for learning Bayesian networks
Motivation: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers
Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs
<p>Abstract</p> <p>Background</p> <p>Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult.</p> <p>Results</p> <p>We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms.</p> <p>Conclusion</p> <p>We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.</p
RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes
Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single
experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in
the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and
the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely
related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach
applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a
modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference
sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its
implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software
Two new zinc(II) acetates with 3- and 4-aminopyridine
The synthesis and characterization of two new zinc(II) coordination compounds with 3- and 4-aminopyridine are reported. They were obtained after adding a water solution of · or dissolving solid · in methanol solutions of 3- and 4-aminopyridine. The products were characterized structurally by single-crystal X-ray diffraction analysis. Colourless crystals of the compound synthesized by the reaction of · and 3-aminopyridine (3-apy), are built of trinuclear complex molecules with the formula (1). The molecules consists of two terminal atoms, coordinated tetrahedrally, and one central atom, coordinated octahedrally. Colourless crystals, obtained by the reaction of · with 4-aminopyridine (4-apy), consist of a mononuclear complex (2). Hydrogen-bonding interactions in the crystal structures of both complexes are reported.Sintetizirali in karakterizirali smo novi cinkovi koordinacijski spojini s 3- in 4-aminopiridinom. Dobili smo ju z dodajanjem metanolne raztopine · v vodno raztopino 3-aminopiridina oziroma raztapljanjem · v metanolni raztopini 4-aminopiridina. Produkta sta bila okarakterizirana z rentgensko strukturno analizo monokristalov. Brezbarvni kristali, pridobljeni z reakcijo med · in 3-aminopiridinom, so zgrajeni iz trijedrnih koordinacijskih molekul s kemijsko formulo (1). Molekula je sestavljena iz dveh terminalnih cinkovih ionov, ki sta tetraedično koordinirana, in enega centralnega iona, ki je oktaedrično koordiniran. Brezbarvni kristali, dobljeni z reakcijo med · in 4-aminopiridinom, sestojijo iz enojedrnih koordinacijskih molekul s kemijsko formulo (2). Poročamo tudi o vodikovih vezeh v kristalnih strukturah obeh spojin
Applying dynamic Bayesian networks to perturbed gene expression data
BACKGROUND: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. RESULTS: We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. CONCLUSION: We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough
Comparison between Suitable Priors for Additive Bayesian Networks
Additive Bayesian networks are types of graphical models that extend the
usual Bayesian generalized linear model to multiple dependent variables through
the factorisation of the joint probability distribution of the underlying
variables. When fitting an ABN model, the choice of the prior of the parameters
is of crucial importance. If an inadequate prior - like a too weakly
informative one - is used, data separation and data sparsity lead to issues in
the model selection process. In this work a simulation study between two weakly
and a strongly informative priors is presented. As weakly informative prior we
use a zero mean Gaussian prior with a large variance, currently implemented in
the R-package abn. The second prior belongs to the Student's t-distribution,
specifically designed for logistic regressions and, finally, the strongly
informative prior is again Gaussian with mean equal to true parameter value and
a small variance. We compare the impact of these priors on the accuracy of the
learned additive Bayesian network in function of different parameters. We
create a simulation study to illustrate Lindley's paradox based on the prior
choice. We then conclude by highlighting the good performance of the
informative Student's t-prior and the limited impact of the Lindley's paradox.
Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure
Recommended from our members
Nucleotide-resolution DNA double-strand breaks mapping by next-generation sequencing
We present a genome-wide method to map DNA double-strand breaks (DSBs) at nucleotide resolution by direct in situ breaks labeling, enrichment on streptavidin, and next-generation sequencing (BLESS). We comprehensively validated and tested BLESS using different human and mouse cells, DSBs-inducing agents, and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs, and complex genome-wide DSBs landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and identified over two thousand non-uniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions with a specificity and resolution unachievable by current techniques
Listen to genes : dealing with microarray data in the frequency domain
Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes
normalization, clustering and network analysis of genes.
Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and
estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The
normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger
causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger
causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting
networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000
genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new
global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail.
Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum
analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency
domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray
data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by
step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of
potential interest to Arabidopsis researchers
Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks
BACKGROUND. Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models. RESULTS. We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time. CONCLUSION. The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.Italian Ministry of University and Scientific Research; National Institutes of Health & National Human Genome Research Institute (HG003354-01A2); Collegio Ghislieri, Pavia Italy fellowshi
Comparative analysis of cis-regulation following stroke and seizures in subspaces of conserved eigensystems
- …