3,383 research outputs found

    Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

    Get PDF
    Background Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. Methods We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. Results We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. Conclusions This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets

    Detecting discordance enrichment among a series of two-sample genome-wide expression data sets

    Get PDF
    Background With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. Methods In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. Results We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. Conclusions This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology

    Breaking the paradigm: Dr Insight empowers signature-free, enhanced drug repurposing

    Get PDF
    Motivation: Transcriptome-based computational drug repurposing has attracted considerable interest by bringing about faster and more cost-effective drug discovery. Nevertheless, key limitations of the current drug connectivity-mapping paradigm have been long overlooked, including the lack of effective means to determine optimal query gene signatures. Results: The novel approach Dr Insight implements a frame-breaking statistical model for the ‘hand-shake’ between disease and drug data. The genome-wide screening of concordantly expressed genes (CEGs) eliminates the need for subjective selection of query signatures, added to eliciting better proxy for potential disease-specific drug targets. Extensive comparisons on simulated and real cancer datasets have validated the superior performance of Dr Insight over several popular drug-repurposing methods to detect known cancer drugs and drug–target interactions. A proof-of-concept trial using the TCGA breast cancer dataset demonstrates the application of Dr Insight for a comprehensive analysis, from redirection of drug therapies, to a systematic construction of disease-specific drug-target networks

    Multi-omics integration reveals molecular networks and regulators of psoriasis.

    Get PDF
    BackgroundPsoriasis is a complex multi-factorial disease, involving both genetic susceptibilities and environmental triggers. Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have been carried out to identify genetic and epigenetic variants that are associated with psoriasis. However, these loci cannot fully explain the disease pathogenesis.MethodsTo achieve a comprehensive mechanistic understanding of psoriasis, we conducted a systems biology study, integrating multi-omics datasets including GWAS, EWAS, tissue-specific transcriptome, expression quantitative trait loci (eQTLs), gene networks, and biological pathways to identify the key genes, processes, and networks that are genetically and epigenetically associated with psoriasis risk.ResultsThis integrative genomics study identified both well-characterized (e.g., the IL17 pathway in both GWAS and EWAS) and novel biological processes (e.g., the branched chain amino acid catabolism process in GWAS and the platelet and coagulation pathway in EWAS) involved in psoriasis. Finally, by utilizing tissue-specific gene regulatory networks, we unraveled the interactions among the psoriasis-associated genes and pathways in a tissue-specific manner and detected potential key regulatory genes in the psoriasis networks.ConclusionsThe integration and convergence of multi-omics signals provide deeper and comprehensive insights into the biological mechanisms associated with psoriasis susceptibility

    Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms

    Get PDF
    Background: Epigenetic change is a hallmark of ageing but its link to ageing mechanisms in humans remains poorly understood. While DNA methylation at many CpG sites closely tracks chronological age, DNA methylation changes relevant to biological age are expected to gradually dissociate from chronological age, mirroring the increased heterogeneity in health status at older ages. Results: Here, we report on the large-scale identification of 6366 age-related variably methylated positions (aVMPs) identified in 3295 whole blood DNA methylation profiles, 2044 of which have a matching RNA-seq gene expression profile. aVMPs are enriched at polycomb repressed regions and, accordingly, methylation at those positions is associated with the expression of genes encoding components of polycomb repressive complex 2 (PRC2) in trans. Further analysis revealed trans-associations for 1816 aVMPs with an additional 854 genes. These trans-associated aVMPs are characterized by either an age-related

    Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains

    Get PDF
    Attention-deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder with a major genetic component. Here, we present a genome-wide association study meta-analysis of ADHD comprising 38,691 individuals with ADHD and 186,843 controls. We identified 27 genome-wide significant loci, highlighting 76 potential risk genes enriched among genes expressed particularly in early brain development. Overall, ADHD genetic risk was associated with several brain-specific neuronal subtypes and midbrain dopaminergic neurons. In exome-sequencing data from 17,896 individuals, we identified an increased load of rare protein-truncating variants in ADHD for a set of risk genes enriched with probable causal common variants, potentially implicating SORCS3 in ADHD by both common and rare variants. Bivariate Gaussian mixture modeling estimated that 84–98% of ADHD-influencing variants are shared with other psychiatric disorders. In addition, common-variant ADHD risk was associated with impaired complex cognition such as verbal reasoning and a range of executive functions, including attention
    • 

    corecore