127,428 research outputs found

    Bayesian variable selection and data integration for biological regulatory networks

    Get PDF
    A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data

    Full text link
    Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach

    A model for gene deregulation detection using expression data

    Get PDF
    In tumoral cells, gene regulation mechanisms are severely altered, and these modifications in the regulations may be characteristic of different subtypes of cancer. However, these alterations do not necessarily induce differential expressions between the subtypes. To answer this question, we propose a statistical methodology to identify the misregulated genes given a reference network and gene expression data. Our model is based on a regulatory process in which all genes are allowed to be deregulated. We derive an EM algorithm where the hidden variables correspond to the status (under/over/normally expressed) of the genes and where the E-step is solved thanks to a message passing algorithm. Our procedure provides posterior probabilities of deregulation in a given sample for each gene. We assess the performance of our method by numerical experiments on simulations and on a bladder cancer data set

    Weighted-Lasso for Structured Network Inference from Time Course Data

    Full text link
    We present a weighted-Lasso method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own a prior internal structure of connectivity which drives the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structure-based penalization both on synthetic data and on two canonical regulatory networks, first yeast cell cycle regulation network by analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network by analysing U. Alon's lab data

    Sparse regulatory networks

    Full text link
    In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L1L_1 penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS350 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore