2,030 research outputs found

    Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks

    Get PDF
    Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. Availability: The methods outlined in this article have been implemented in Matlab and are available on request

    Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Probability based statistical learning methods such as mutual information and Bayesian networks have emerged as a major category of tools for reverse engineering mechanistic relationships from quantitative biological data. In this work we introduce a new statistical learning strategy, MI3 that addresses three common issues in previous methods simultaneously: (1) handling of continuous variables, (2) detection of more complex three-way relationships and (3) better differentiation of causal versus confounding relationships. With these improvements, we provide a more realistic representation of the underlying biological system.</p> <p>Results</p> <p>We test the MI3 algorithm using both synthetic and experimental data. In the synthetic data experiment, MI3 achieved an absolute sensitivity/precision of 0.77/0.83 and a relative sensitivity/precision both of 0.99. In addition, MI3 significantly outperformed the control methods, including Bayesian networks, classical two-way mutual information and a discrete version of MI3. We then used MI3 and control methods to infer a regulatory network centered at the MYC transcription factor from a published microarray dataset. Models selected by MI3 were numerically and biologically distinct from those selected by control methods. Unlike control methods, MI3 effectively differentiated true causal models from confounding models. MI3 recovered major MYC cofactors, and revealed major mechanisms involved in MYC dependent transcriptional regulation, which are strongly supported by literature. The MI3 network showed that limited sets of regulatory mechanisms are employed repeatedly to control the expression of large number of genes.</p> <p>Conclusion</p> <p>Overall, our work demonstrates that MI3 outperforms the frequently used control methods, and provides a powerful method for inferring mechanistic relationships underlying biological and other complex systems. The MI3 method is implemented in R in the "mi3" package, available under the GNU GPL from <url>http://sysbio.engin.umich.edu/~luow/downloads.php</url> and from the R package archive CRAN.</p

    Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

    Get PDF
    Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind

    HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression

    Get PDF
    Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in Arabidopsis thaliana led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration

    TGMI: an efficient algorithm for identifying pathway regulators through evaluation of triple-gene mutual interaction

    Get PDF
    Despite their important roles, the regulators for most metabolic pathways and biological processes remain elusive. Presently, the methods for identifying metabolic pathway and biological process regulators are intensively sought after. We developed a novel algorithm called triple-gene mutual interaction (TGMI) for identifying these regulators using high-throughput gene expression data. It first calculated the regulatory interactions among triple gene blocks (two pathway genes and one transcription factor (TF)), using conditional mutual information, and then identifies significantly interacted triple genes using a newly identified novel mutual interaction measure (MIM), which was substantiated to reflect strengths of regulatory interactions within each triple gene block. The TGMI calculated the MIM for each triple gene block and then examined its statistical significance using bootstrap. Finally, the frequencies of all TFs present in all significantly interacted triple gene blocks were calculated and ranked. We showed that the TFs with higher frequencies were usually genuine pathway regulators upon evaluating multiple pathways in plants, animals and yeast. Comparison of TGMI with several other algorithms demonstrated its higher accuracy. Therefore, TGMI will be a valuable tool that can help biologists to identify regulators of metabolic pathways and biological processes from the exploded high-throughput gene expression data in public repositories

    Modelling signaling networks underlying plant defence

    Get PDF
    Transcriptional reprogramming plays a significant role in governing plant responses to pathogens. The underlying regulatory networks are complex and dynamic, responding to numerous input signals. Most network modelling studies to date have used large-scale expression data sets from public repositories but defence network models with predictive ability have also been inferred from single time series data sets, and sophisticated biological insights generated from focused experiments containing multiple network perturbations. Using multiple network inference methods, or combining network inference with additional data, such as promoter motifs, can enhance the ability of the model to predict gene function or regulatory relationships. Network topology can highlight key signaling components and provides a systems level understanding of plant defence
    corecore