26 research outputs found

    The Limitations of Simple Gene Set Enrichment Analysis Assuming Gene Independence

    Get PDF
    Since its first publication in 2003, the Gene Set Enrichment Analysis (GSEA) method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach, using a one sample t test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes GSEA's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with GSEA's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.Comment: Submitted to Statistical Methods in Medical Researc

    Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models

    Get PDF
    Type 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug treatment, and gene knockouts, and yet all display the clinical hallmarks of hyperglycemia and insulin resistance in peripheral tissue. The recent advances in gene-expression microarray technologies present an unprecedented opportunity to study type 2 diabetes mellitus at a genome-wide scale and across different models. To date, a key challenge has been to identify the biological processes or signaling pathways that play significant roles in the disorder. Here, using a network-based analysis methodology, we identified two sets of genes, associated with insulin signaling and a network of nuclear receptors, which are recurrent in a statistically significant number of diabetes and insulin resistance models and transcriptionally altered across diverse tissue types. We additionally identified a network of protein–protein interactions between members from the two gene sets that may facilitate signaling between them. Taken together, the results illustrate the benefits of integrating high-throughput microarray studies, together with protein–protein interaction networks, in elucidating the underlying biological processes associated with a complex disorder

    Characterizing genomic alterations in cancer by complementary functional associations.

    Get PDF
    Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment. We used REVEALER to uncover complementary genomic alterations associated with the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER successfully identified both known and new associations, demonstrating the power of combining functional profiles with extensive characterization of genomic alterations in cancer genomes

    Over-expression of the vitamin D receptor (VDR) induces skeletal muscle hypertrophy

    Get PDF
    ObjectiveThe Vitamin D receptor (VDR) has been positively associated with skeletal muscle mass, function and regeneration. Mechanistic studies have focused upon loss of the receptor, with in vivo whole-body knockout models demonstrating reduced myofiber size and function, and impaired muscle development. To understand the mechanistic role upregulation of the VDR elicits in muscle mass/health, we studied the impact of VDR over-expression (OE) in vivo, before exploring the importance of VDR expression upon muscle hypertrophy in humans.MethodsWistar rats underwent in vivo electrotransfer (IVE) to over-express the VDR in Tibialis anterior (TA) muscle for 10 days, before comprehensive physiological and metabolic profiling to characterise the influence of VDR-OE on muscle protein synthesis (MPS), anabolic signalling and satellite cell activity. Stable isotope tracer (D2O) techniques were used to assess sub-fraction protein synthesis, alongside RNA-Seq analysis. Finally, human participants underwent 20-wks resistance exercise training, with body composition and transcriptomic analysis.ResultsMuscle VDR-OE yielded total protein and RNA accretion, manifesting in increased myofibre area i.e. hypertrophy. The observed increases in MPS were associated with enhanced anabolic signalling reflecting translational efficiency (e.g. mTOR-signalling), with no effects upon protein breakdown markers being observed. Additionally, RNA-Seq illustrated marked extracellular matrix (ECM) remodeling, while satellite cell content, markers of proliferation and associated cell-cycled related gene-sets were up-regulated. Finally, induction of VDR mRNA correlated with muscle hypertrophy in humans following long-term resistance exercise type training.ConclusionVDR-OE stimulates muscle hypertrophy ostensibly via heightened protein synthesis, translational efficiency, ribosomal expansion and up-regulation of ECM remodelling related gene-sets. Furthermore, VDR expression is a robust marker of the hypertrophic response to resistance exercise in humans. The VDR is a viable target of muscle maintenance through testable Vitamin D molecules, as active molecules and analogs

    Role of intrinsic DNA binding specificity in defining target genes of the mammalian transcription factor PDX1

    No full text
    PDX1 is a homeodomain transcription factor essential for pancreatic development and mature beta cell function. Homeodomain proteins typically recognize short TAAT DNA motifs in vitro: this binding displays paradoxically low specificity and affinity, given the extremely high specificity of action of these proteins in vivo. To better understand how PDX1 selects target genes in vivo, we have examined the interaction of PDX1 with natural and artificial binding sites. Comparison of PDX1 binding sites in several target promoters revealed an evolutionarily conserved pattern of nucleotides flanking the TAAT core. Using competitive in vitro DNA binding assays, we defined three groups of binding sites displaying high, intermediate and low affinity. Transfection experiments revealed a striking correlation between the ability of each sequence to activate transcription in cultured beta cells, and its ability to bind PDX1 in vitro. Site selection from a pool of oligonucleotides (sequence NNNTAATNNN) revealed a non-random preference for particular nucleotides at the flanking locations, resembling natural PDX1 binding sites. Taken together, the data indicate that the intrinsic DNA binding specificity of PDX1, in particular the bases adjacent to TAAT, plays an important role in determining the spectrum of target genes

    The Molecular Signatures Database (MSigDB) hallmark gene set collection.

    No full text
    The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis

    Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

    Get PDF
    Biological databases represent an extraordinary collective volume of work. Diligently built up over decades and comprising many millions of contributions from the biomedical research community, biological databases provide worldwide access to a massive number of records (also known as entries) [1]. Starting from individual laboratories, genomes are sequenced, assembled, annotated, and ultimately submitted to primary nucleotide databases such as GenBank [2], European Nucleotide Archive (ENA) [3], and DNA Data Bank of Japan (DDBJ) [4] (collectively known as the International Nucleotide Sequence Database Collaboration, INSDC). Protein records, which are the translations of these nucleotide records, are deposited into central protein databases such as the UniProt KnowledgeBase (UniProtKB) [5] and the Protein Data Bank (PDB) [6]. Sequence records are further accumulated into different databases for more specialized purposes: RFam [7] and PFam [8] for RNA and protein families, respectively; DictyBase [9] and PomBase [10] for model organisms; as well as ArrayExpress [11] and Gene Expression Omnibus (GEO) [12] for gene expression profiles. These databases are selected as examples; the list is not intended to be exhaustive. However, they are representative of biological databases that have been named in the “golden set” of the 24th Nucleic Acids Research database issue (in 2016). The introduction of that issue highlights the databases that “consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database” [13]. In addition, the associated information about sequences is also propagated into non-sequence databases, such as PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) for scientific literature or Gene Ontology (GO) [14] for function annotations. These databases in turn benefit individual studies, many of which use these publicly available records as the basis for their own research

    Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation

    No full text
    Gene-expression profiling has become a mainstay in immunology, but subtle changes in gene networks related to biological processes are hard to discern when comparing various datasets. For instance, conservation of the transcriptional response to sepsis in mouse models and human disease remains controversial. To improve transcriptional analysis in immunology, we created ImmuneSigDB: a manually annotated compendium of ∼5,000 gene-sets from diverse cell states, experimental manipulations, and genetic perturbations in immunology. Analysis using ImmuneSigDB identified signatures induced in activated myeloid cells and differentiating lymphocytes that were highly conserved between humans and mice. Sepsis triggered conserved patterns of gene expression in humans and mouse models. However, we also identified species-specific biological processes in the sepsis transcriptional response: although both species upregulated phagocytosis-related genes, a mitosis signature was specific to humans. ImmuneSigDB enables granular analysis of transcriptomic data to improve biological understanding of immune processes of the human and mouse immune systems

    GSKB: A gene set database for pathway analysis in mouse

    No full text
    ABSTRACT Interpretation of high-throughput genomics data based on biological pathways constitutes a constant challenge, partly because of the lack of supporting pathway database. In this study, we created a functional genomics knowledgebase in mouse, which includes 33,261 pathways and gene sets compiled from 40 sources such as Gene Ontology, KEGG, GeneSetDB, PANTHER, microRNA and transcription factor target genes, etc . In addition, we also manually collected and curated 8,747 lists of differentially expressed genes from 2,526 published gene expression studies to enable the detection of similarity to previously reported gene expression signatures. These two types of data constitute a Gene Set Knowledgebase (GSKB), which can be readily used by various pathway analysis software such as gene set enrichment analysis (GSEA). Using our knowledgebase, we were able to detect the correct microRNA (miR-29) pathway that was suppressed using antisense oligonucleotides and confirmed its role in inhibiting fibrogenesis, which might involve upregulation of transcription factor SMAD3. The knowledgebase can be queried as a source of published gene lists for further meta-analysis. Through meta-analysis of 56 published gene lists related to retina cells, we revealed two fundamentally different types of gene expression changes. One is related to stress and inflammatory response blamed for causing blindness in many diseases; the other associated with visual perception by normal retina cells. GSKB is available online at http://ge-lab.org/gs/ , and also as a Bioconductor package (gskb, https://bioconductor.org/packages/gskb/ ). This database enables in-depth interpretation of mouse genomics data both in terms of known pathways and the context of thousands of published expression signatures
    corecore