39 research outputs found
A constraint optimization framework for discovery of cellular signaling and regulatory networks
Thesis (Ph. D.)--Massachusetts Institute of Technology, Computational and Systems Biology Program, 2011.Cataloged from PDF version of thesis.Includes bibliographical references.Cellular signaling and regulatory networks underlie fundamental biological processes such as growth, differentiation, and response to the environment. Although there are now various high-throughput methods for studying these processes, knowledge of them remains fragmentary. Typically, the majority of hits identified by transcriptional, proteomic, and genetic assays lie outside of the expected pathways. In addition, not all components in the regulatory networks can be exposed in one experiment because of systematic biases in the assays. These unexpected and hidden components of the cellular response are often the most interesting, because they can provide new insights into biological processes and potentially reveal new therapeutic approaches. However, they are also the most difficult to interpret. We present a technique, based on the Steiner tree problem, that uses a probabilistic protein-protein interaction network and high confidence measurement and prediction of protein-DNA interactions, to determine how these hits are organized into functionally coherent pathways, revealing many components of the cellular response that are not readily apparent in the original data. We report the results of applying this method to (1) phosphoproteomic and transcriptional data from the pheromone response in yeast, and (2) phosphoproteomic, DNaseI hypersensitivity sequencing and mRNA profiling data from the U87MG glioblastoma cell lines over-expressing the variant III mutant of the epidermal growth factor receptor (EGFRvIII). In both cases the method identifies changes in diverse cellular processes that extend far beyond the expected pathways. Analysis of the EGFRVIII network connectivity property and transcriptional regulators that link observed changes in protein phosphorylation and differential expression suggest a few intriguing hypotheses that may lead to improved therapeutic strategy for glioblastoma.by Shao-shan Carol Huang.Ph.D
SteinerNet: a web server for integrating ‘omic’ data to discover hidden components of response pathways
High-throughput technologies including transcriptional profiling, proteomics and reverse genetics screens provide detailed molecular descriptions of cellular responses to perturbations. However, it is difficult to integrate these diverse data to reconstruct biologically meaningful signaling networks. Previously, we have established a framework for integrating transcriptional, proteomic and interactome data by searching for the solution to the prize-collecting Steiner tree problem. Here, we present a web server, SteinerNet, to make this method available in a user-friendly format for a broad range of users with data from any species. At a minimum, a user only needs to provide a set of experimentally detected proteins and/or genes and the server will search for connections among these data from the provided interactomes for yeast, human, mouse, Drosophila melanogaster and Caenorhabditis elegans. More advanced users can upload their own interactome data as well. The server provides interactive visualization of the resulting optimal network and downloadable files detailing the analysis and results. We believe that SteinerNet will be useful for researchers who would like to integrate their high-throughput data for a specific condition or cellular response and to find biologically meaningful pathways. SteinerNet is accessible at http://fraenkel.mit.edu/steinernet.National Institutes of Health (U.S.) (U54-CA112967)National Institutes of Health (U.S.) (R01-GM089903)National Science Foundation (Award Number DB1-0821391)National Institutes of Health (U.S.) (U54-CA112967
Orchestration of the stilbene synthase gene family and their regulators by subgroup 2 MYB genes
The control of plant specialised metabolism is exerted by transcription factors and co-regulators acting on cis-regulatory DNA sequences of pathway-structural genes, determining when, where, and how metabolites are accumulated. A particularly interesting case for studying the transcriptional control of metabolism is represented by stilbenoids, produced within the phenylpropanoid pathway, as their ability to inhibit infection by coronaviruses MERS-CoV and SARS-CoV has been recently demonstrated in vitro. Integrative omic studies in grapevine (Vitis vinifera L.), including gene co-expression networks, have previously highlighted several transcription factors (TFs) from different gene families as potential modulators of stilbenoid accumulation, offering an ideal framework for gene function characterisation using genome-wide approaches. In the context of non-model plant species, DNA affinity purification sequencing (DAP-Seq) results a novel and potentially powerful tool for the analysis of novel uncharacterised regulators, however, it has not yet been applied in fruit crops. Accordingly, we tested as a proof-of-concept the binding of two previously characterised R2R3-MYB TFs to their known targets of the stilbene pathway, MYB14 and MYB15, obtaining 5,222 and 4,502 binding events assigned to 4,038 and 3,645 genes for each TF, respectively. Bound genes (putative targets) were overlapped with aggregated gene centred co-expression networks resulting in shared and exclusive High Confidence Targets (HCTs) suggesting a high, but not complete, redundancy. Our results show that in addition to the previously known but few STS targets, these regulators bind to almost half of the complete STS family in addition to other phenylpropanoid- and stilbenoid-related genes. We also suggest they are potentially involved in other processes such as the circadian rhythm or the synthesis of biotin. We searched the activated transcriptomes of transiently MYB15-overexpressing grapevine plants and observed a large activation of its high confidence targets, validating our methodological approach. Our results also show that MYB15 seems to play a role in regulating other stilbenoid-related TFs such as WRKY03.This work was supported by Grant PGC2018-099449-A-I00 and by the Ramón y Cajal program grant
RYC-2017-23645, both awarded to J.T.M. and to the FPI scholarship PRE2019-088044 granted to
L.O. from the Ministerio de Ciencia, Innovaci´on y Universidades (MCIU, Spain), Agencia Estatal de
Investigaci´on (AEI, Spain), and Fondo Europeo de Desarrollo Regional (FEDER, European Union).
C.Z. is supported by China Scholarship Council (CSC) no. 201906300087. This article is based upon
work from COST Action CA 17111 INTEGRAPE, supported by COST (European Cooperation in
Science and Technology). Data has been treated and uploaded in public repositories according to
the FAIR principles.N
First Plant Cell Atlas symposium report
The Plant Cell Atlas (PCA) community hosted a virtual symposium on December 9 and 10, 2021 on single cell and spatial omics technologies. The conference gathered almost 500 academic, industry, and government leaders to identify the needs and directions of the PCA community and to explore how establishing a data synthesis center would address these needs and accelerate progress. This report details the presentations and discussions focused on the possibility of a data synthesis center for a PCA and the expected impacts of such a center on advancing science and technology globally. Community discussions focused on topics such as data analysis tools and annotation standards; computational expertise and cyber-infrastructure; modes of community organization and engagement; methods for ensuring a broad reach in the PCA community; recruitment, training, and nurturing of new talent; and the overall impact of the PCA initiative. These targeted discussions facilitated dialogue among the participants to gauge whether PCA might be a vehicle for formulating a data synthesis center. The conversations also explored how online tools can be leveraged to help broaden the reach of the PCA (i.e., online contests, virtual networking, and social media stakeholder engagement) and decrease costs of conducting research (e.g., virtual REU opportunities). Major recommendations for the future of the PCA included establishing standards, creating dashboards for easy and intuitive access to data, and engaging with a broad community of stakeholders. The discussions also identified the following as being essential to the PCA’s success: identifying homologous cell-type markers and their biocuration, publishing datasets and computational pipelines, utilizing online tools for communication (such as Slack), and user-friendly data visualization and data sharing. In conclusion, the development of a data synthesis center will help the PCA community achieve these goals by providing a centralized repository for existing and new data, a platform for sharing tools, and new analytical approaches through collaborative, multidisciplinary efforts. A data synthesis center will help the PCA reach milestones, such as community-supported data evaluation metrics, accelerating plant research necessary for human and environmental health
Direct regulation of shikimate, early phenylpropanoid, and stilbenoid pathways by subgroup 2 R2R3-MYBs in grapevine
The stilbenoid pathway is responsible for the production of resveratrol in grapevine (Vitis vinifera L.). A few transcription factors (TFs) have been identified as regulators of this pathway but the extent of this control has not been deeply studied. Here we show how DNA affinity purification sequencing (DAP-Seq) allows for the genome-wide TF-binding site interrogation in grape. We obtained 5190 and 4443 binding events assigned to 4041 and 3626 genes for MYB14 and MYB15, respectively (approximately 40% of peaks located within −10 kb of transcription start sites). DAP-Seq of MYB14/MYB15 was combined with aggregate gene co-expression networks (GCNs) built from more than 1400 transcriptomic datasets from leaves, fruits, and flowers to narrow down bound genes to a set of high confidence targets. The analysis of MYB14, MYB15, and MYB13, a third uncharacterized member of Subgroup 2 (S2), showed that in addition to the few previously known stilbene synthase (STS) targets, these regulators bind to 30 of 47 STS family genes. Moreover, all three MYBs bind to several PAL, C4H, and 4CL genes, in addition to shikimate pathway genes, the WRKY03 stilbenoid co-regulator and resveratrol-modifying gene candidates among which ROMT2-3 were validated enzymatically. A high proportion of DAP-Seq bound genes were induced in the activated transcriptomes of transient MYB15-overexpressing grapevine leaves, validating our methodological approach for delimiting TF targets. Overall, Subgroup 2 R2R3-MYBs appear to play a key role in binding and directly regulating several primary and secondary metabolic steps leading to an increased flux towards stilbenoid production. The integration of DAP-Seq and reciprocal GCNs offers a rapid framework for gene function characterization using genome-wide approaches in the context of non-model plant species and stands up as a valid first approach for identifying gene regulatory networks of specialized metabolism.This work was supported by Grant PGC2018-099449-A-I00 and by the Ramón y Cajal program (grant RYC-2017-23 645), both awarded to JTM, and to the FPI scholarship (PRE2019-088044) granted to LO from the Ministerio de Ciencia, Innovación y Universidades (MCIU, Spain), Agencia Estatal de Investigación (AEI, Spain), and Fondo Europeo de Desarrollo Regional (FEDER, European Union). CZ is supported by China Scholarship Council (CSC; no. 201906300087). KG and ZR were supported by the Slovenian Research Agency (grants P4-0165 and Z7-1888). SCH is partially supported by the National Science Foundation (grant PGRP IOS-1916804). This article is based upon work from COST Action CA 17111 INTEGRAPE, supported by COST (European Cooperation in Science and Technology).Peer reviewe
Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene-induced Signaling
Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118–310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets.National Science Foundation (U.S.) (DB1-0821391)National Institutes of Health (U.S.) (Grant U54-CA112967)National Institutes of Health (U.S.) (Grant R01-GM089903)National Institutes of Health (U.S.) (P30-ES002109
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Vision, challenges and opportunities for a Plant Cell Atlas
With growing populations and pressing environmental problems, future economies will be increasingly plant-based. Now is the time to reimagine plant science as a critical component of fundamental science, agriculture, environmental stewardship, energy, technology and healthcare. This effort requires a conceptual and technological framework to identify and map all cell types, and to comprehensively annotate the localization and organization of molecules at cellular and tissue levels. This framework, called the Plant Cell Atlas (PCA), will be critical for understanding and engineering plant development, physiology and environmental responses. A workshop was convened to discuss the purpose and utility of such an initiative, resulting in a roadmap that acknowledges the current knowledge gaps and technical challenges, and underscores how the PCA initiative can help to overcome them.</jats:p
Data archive: CICT for single cell RNA-seq network inference
<p>This archive contains benchmarking input data and results for using single cell gene expression data to infer gene regulatory networks (GRN) by the Causal Inference with Composition of Transactions (CICT) method and a selected set of published methods. This accompanies the manuscript "Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions" (Shojaee and Huang, Brief in Bioinform 2023. DOI: 10.1093/bib/bbad370). The CICT code is available at the GitHub repo (https://github.com/hlab1/scRNAseqWithCICT/).</p><p>The original CICT algorithm was described in Shojaee et al. (arXiv:1608.02658, 2016). The benchmarked methods were included in the BEELINE benchmarking pipeline (Pratapa et al., Nat Methods 2020), to which we added DEEPDRIM (Chen et al., Brief Bioinform 2021), SCENIC (Aibar et al., Nat Methods 2017), Inferelator 3.0 (Gibbs et al., Bioinformatics 2022), and CellOracle (Kamimoto et al., Nature 2023). The output directory names are (subdirectories within each dataset):</p><p>* CICT_ewMIshrink_RFmaxdepth10_RFntrees20/: CICT for simulated data<br>* CICT_v2/: CICT for experimental data<br>* CELLORACLEDB/: CellOracle for experimental data<br>* DEEPDRIM72_ewMIshrink_RFmaxdepth10_RFntrees20/: DEEPDRIM for simulated data<br>* DEEPDRIM72_v2/: DEEPDRIM for experimental data<br>* INFERELATOR38_ewMIshrink_RFmaxdepth10_RFntrees20/: Inferelator-Prior for simulated data<br>* INFERELATOR38_v2/: Inferelator-Prior for experimental data<br>* INFERELATOR34_ewMIshrink_RFmaxdepth10_RFntrees20/: Inferelator-NoPrior for experimental data<br>* INFERELATOR34_v2/: Inferelator-NoPrior for experimental data<br>* GENIE3/: GENIE3<br>* GRNBOOST2/: GRNBOST2<br>* LEAP/: LEAP<br>* PIDC/: PIDC<br>* PPCOR/: PPCOR<br>* SCENICDB/: SCENIC for experimental data<br>* SCNS/: SCNS<br>* SCODE/: SCODE<br>* SCRIBE/: SCRIBE<br>* SINCERITIES/: SINCERITIES<br>* SINGE/: SINGE<br>* RANDOM/: RANDOM</p><p>The methods were benchmarked against two kinds of scRNA-seq datasets:<br>* Simulated datasets produced by the SERGIO simulator from a synthetic network (Dibaeinia et al., Cell Systems 2020), including complete datasets and datasets with dropouts with shape parameter k=6.5 and rate parameter q=10, 30, 50, 70, 80. <br>* Experimental datasets compiled by the BEELINE pipeline, evaluated at three different levels L0, L1 and L2, with three types of ground truth networks.<br> * Evaluation levels:<br> * L0: 500 highly varying genes plus TFs<br> * L1: 1000 highly varying genes plus TFs<br> * L2: 500 highly varying genes, TFs and 500 genes randomly selected that excluded the 1000 highly varying genes from L1.<br> * Types of ground truths:<br> * Cell-type-specific ChIP-seq ground truth (L0, L1, L2)<br> * Non-specific ChIP-seq ground truth (L0_ns, L1_ns, L2_ns)<br> * Loss-of-function/gain-of-function ground truth (L0_lofgof, L1_lofgof, L2_lofgof)</p><p>The directory structure is organized in accordance with the BEELINE benchmarking pipeline. For complete details please please see the BEELINE documentation (https://murali-group.github.io/Beeline/) and Github repo (https://github.com/Murali-group/Beeline).</p><p> </p>
Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks
Cellular signaling and regulatory networks underlie fundamental biological processes such as growth, differentiation, and response to the environment. Although there are now various high-throughput methods for studying these processes, knowledge of them remains fragmentary. Typically, the majority of hits identified by transcriptional, proteomic, and genetic assays lie outside of the expected pathways. These unexpected components of the cellular response are often the most interesting, because they can provide new insights into biological processes and potentially reveal new therapeutic approaches. However, they are also the most difficult to interpret. We present a technique, based on the Steiner tree problem, that uses previously reported protein-protein and protein-DNA interactions to determine how these hits are organized into functionally coherent pathways, revealing many components of the cellular response that are not readily apparent in the original data. Applied simultaneously to phosphoproteomic and transcriptional data for the yeast pheromone response, it identifies changes in diverse cellular processes that extend far beyond the expected pathways