18 research outputs found

    NetProphet 2.0: Mapping transcription factor networks by exploiting scalable data resources

    Get PDF
    MOTIVATION: Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and \u27integrative\u27 algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. RESULTS: We present NetProphet 2.0, a \u27data light\u27 algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. AVAILABILITY AND IMPLEMENTATION: Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Mapping functional transcription factor networks from gene expression data

    Get PDF
    A critical step in understanding how a genome functions is determining which transcription factors (TFs) regulate each gene. Accordingly, extensive effort has been devoted to mapping TF networks. In Saccharomyces cerevisiae, protein–DNA interactions have been identified for most TFs by ChIP-chip, and expression profiling has been done on strains deleted for most TFs. These studies revealed that there is little overlap between the genes whose promoters are bound by a TF and those whose expression changes when the TF is deleted, leaving us without a definitive TF network for any eukaryote and without an efficient method for mapping functional TF networks. This paper describes NetProphet, a novel algorithm that improves the efficiency of network mapping from gene expression data. NetProphet exploits a fundamental observation about the nature of TF networks: The response to disrupting or overexpressing a TF is strongest on its direct targets and dissipates rapidly as it propagates through the network. Using S. cerevisiae data, we show that NetProphet can predict thousands of direct, functional regulatory interactions, using only gene expression data. The targets that NetProphet predicts for a TF are at least as likely to have sites matching the TF's binding specificity as the targets implicated by ChIP. Unlike most ChIP targets, the NetProphet targets also show evidence of functional regulation. This suggests a surprising conclusion: The best way to begin mapping direct, functional TF-promoter interactions may not be by measuring binding. We also show that NetProphet yields new insights into the functions of several yeast TFs, including a well-studied TF, Cbf1, and a completely unstudied TF, Eds1

    Model-driven mapping of transcriptional networks reveals the circuitry and dynamics of virulence regulation

    Get PDF
    Key steps in understanding a biological process include identifying genes that are involved and determining how they are regulated. We developed a novel method for identifying transcription factors (TFs) involved in a specific process and used it to map regulation of the key virulence factor of a deadly fungus—its capsule. The map, built from expression profiles of 41 TF mutants, includes 20 TFs not previously known to regulate virulence attributes. It also reveals a hierarchy comprising executive, midlevel, and “foreman” TFs. When grouped by temporal expression pattern, these TFs explain much of the transcriptional dynamics of capsule induction. Phenotypic analysis of TF deletion mutants revealed complex relationships among virulence factors and virulence in mice. These resources and analyses provide the first integrated, systems-level view of capsule regulation and biosynthesis. Our methods dramatically improve the efficiency with which transcriptional networks can be analyzed, making genomic approaches accessible to laboratories focused on specific physiological processes

    Cryptococcus neoformans Dual GDP-mannose transporters and their role in biology and virulence

    Get PDF
    Cryptococcus neoformans is an opportunistic yeast responsible for lethal meningoencephalitis in humans. This pathogen elaborates a polysaccharide capsule, which is its major virulence factor. Mannose constitutes over one-half of the capsule mass and is also extensively utilized in cell wall synthesis and in glycosylation of proteins and lipids. The activated mannose donor for most biosynthetic reactions, GDP-mannose, is made in the cytosol, although it is primarily consumed in secretory organelles. This compartmentalization necessitates specific transmembrane transporters to make the donor available for glycan synthesis. We previously identified two cryptococcal GDP-mannose transporters, Gmt1 and Gmt2. Biochemical studies of each protein expressed in Saccharomyces cerevisiae showed that both are functional, with similar kinetics and substrate specificities in vitro. We have now examined these proteins in vivo and demonstrate that cells lacking Gmt1 show significant phenotypic differences from those lacking Gmt2 in terms of growth, colony morphology, protein glycosylation, and capsule phenotypes. Some of these observations may be explained by differential expression of the two genes, but others suggest that the two proteins play overlapping but nonidentical roles in cryptococcal biology. Furthermore, gmt1 gmt2 double mutant cells, which are unexpectedly viable, exhibit severe defects in capsule synthesis and protein glycosylation and are avirulent in mouse models of cryptococcosis

    Computational Analysis Reveals a Key Regulator of Cryptococcal Virulence and Determinant of Host Response

    Get PDF
    Cryptococcus neoformans is a ubiquitous, opportunistic fungal pathogen that kills over 600,000 people annually. Here, we report integrated computational and experimental investigations of the role and mechanisms of transcriptional regulation in cryptococcal infection. Major cryptococcal virulence traits include melanin production and the development of a large polysaccharide capsule upon host entry; shed capsule polysaccharides also impair host defenses. We found that both transcription and translation are required for capsule growth and that Usv101 is a master regulator of pathogenesis, regulating melanin production, capsule growth, and capsule shedding. It does this by directly regulating genes encoding glycoactive enzymes and genes encoding three other transcription factors that are essential for capsule growth: GAT201, RIM101, and SP1. Murine infection with cryptococci lacking Usv101 significantly alters the kinetics and pathogenesis of disease, with extended survival and, unexpectedly, death by pneumonia rather than meningitis. Our approaches and findings will inform studies of other pathogenic microbes

    A community effort to identify and correct mislabeled samples in proteogenomic studies

    No full text
    Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets

    Towards accurate indel calling for oncopanel sequencing through an international pipeline competition at precisionFDA

    No full text
    Abstract Accurately calling indels with next-generation sequencing (NGS) data is critical for clinical application. The precisionFDA team collaborated with the U.S. Food and Drug Administration’s (FDA’s) National Center for Toxicological Research (NCTR) and successfully completed the NCTR Indel Calling from Oncopanel Sequencing Data Challenge, to evaluate the performance of indel calling pipelines. Top performers were selected based on precision, recall, and F1-score. The performance of many other pipelines was close to the top performers, which produced a top cluster of performers. The performance was significantly higher in high confidence regions and coding regions, and significantly lower in low complexity regions. Oncopanel capture and other issues may have occurred that affected the recall rate. Indels with higher variant allele frequency (VAF) may generally be called with higher confidence. Many of the indel calling pipelines had good performance. Some of them performed generally well across all three oncopanels, while others were better for a specific oncopanel. The performance of indel calling can further be improved by restricting the calls within high confidence intervals (HCIs) and coding regions, and by excluding low complexity regions (LCR) regions. Certain VAF cut-offs could be applied according to the applications

    NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources

    Get PDF
    MOTIVATION: Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and \u27integrative\u27 algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. RESULTS: We present NetProphet 2.0, a \u27data light\u27 algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. AVAILABILITY AND IMPLEMENTATION: Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
    corecore