3,383 research outputs found
Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets
Background
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. Methods
We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. Results
We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. Conclusions
This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets
Detecting discordance enrichment among a series of two-sample genome-wide expression data sets
Background
With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. Methods
In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. Results
We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. Conclusions
This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology
Breaking the paradigm: Dr Insight empowers signature-free, enhanced drug repurposing
Motivation: Transcriptome-based computational drug repurposing has attracted considerable interest by bringing about faster and more cost-effective drug discovery. Nevertheless, key limitations of the current drug connectivity-mapping paradigm have been long overlooked, including the lack of effective means to determine optimal query gene signatures. Results: The novel approach Dr Insight implements a frame-breaking statistical model for the âhand-shakeâ between disease and drug data. The genome-wide screening of concordantly expressed genes (CEGs) eliminates the need for subjective selection of query signatures, added to eliciting better proxy for potential disease-specific drug targets. Extensive comparisons on simulated and real cancer datasets have validated the superior performance of Dr Insight over several popular drug-repurposing methods to detect known cancer drugs and drugâtarget interactions. A proof-of-concept trial using the TCGA breast cancer dataset demonstrates the application of Dr Insight for a comprehensive analysis, from redirection of drug therapies, to a systematic construction of disease-specific drug-target networks
Multi-omics integration reveals molecular networks and regulators of psoriasis.
BackgroundPsoriasis is a complex multi-factorial disease, involving both genetic susceptibilities and environmental triggers. Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have been carried out to identify genetic and epigenetic variants that are associated with psoriasis. However, these loci cannot fully explain the disease pathogenesis.MethodsTo achieve a comprehensive mechanistic understanding of psoriasis, we conducted a systems biology study, integrating multi-omics datasets including GWAS, EWAS, tissue-specific transcriptome, expression quantitative trait loci (eQTLs), gene networks, and biological pathways to identify the key genes, processes, and networks that are genetically and epigenetically associated with psoriasis risk.ResultsThis integrative genomics study identified both well-characterized (e.g., the IL17 pathway in both GWAS and EWAS) and novel biological processes (e.g., the branched chain amino acid catabolism process in GWAS and the platelet and coagulation pathway in EWAS) involved in psoriasis. Finally, by utilizing tissue-specific gene regulatory networks, we unraveled the interactions among the psoriasis-associated genes and pathways in a tissue-specific manner and detected potential key regulatory genes in the psoriasis networks.ConclusionsThe integration and convergence of multi-omics signals provide deeper and comprehensive insights into the biological mechanisms associated with psoriasis susceptibility
Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms
Background: Epigenetic change is a hallmark of ageing but its link to ageing mechanisms in humans remains poorly understood. While DNA methylation at many CpG sites closely tracks chronological age, DNA methylation changes relevant to biological age are expected to gradually dissociate from chronological age, mirroring the increased heterogeneity in health status at older ages. Results: Here, we report on the large-scale identification of 6366 age-related variably methylated positions (aVMPs) identified in 3295 whole blood DNA methylation profiles, 2044 of which have a matching RNA-seq gene expression profile. aVMPs are enriched at polycomb repressed regions and, accordingly, methylation at those positions is associated with the expression of genes encoding components of polycomb repressive complex 2 (PRC2) in trans. Further analysis revealed trans-associations for 1816 aVMPs with an additional 854 genes. These trans-associated aVMPs are characterized by either an age-related
Recommended from our members
Chromatin dysregulation and DNA methylation at transcription start sites associated with transcriptional repression in cancers.
Although promoter-associated CpG islands have been established as targets of DNA methylation changes in cancer, previous studies suggest that epigenetic dysregulation outside the promoter region may be more closely associated with transcriptional changes. Here we examine DNA methylation, chromatin marks, and transcriptional alterations to define the relationship between transcriptional modulation and spatial changes in chromatin structure. Using human papillomavirus-related oropharyngeal carcinoma as a model, we show aberrant enrichment of repressive H3K9me3 at the transcriptional start site (TSS) with methylation-associated, tumor-specific gene silencing. Further analysis identifies a hypermethylated subtype which shows a functional convergence on MYC targets and association with CREBBP/EP300 mutation. The tumor-specific shift to transcriptional repression associated with DNA methylation at TSSs was confirmed in multiple tumor types. Our data may show a common underlying epigenetic dysregulation in cancer associated with broad enrichment of repressive chromatin marks and aberrant DNA hypermethylation at TSSs in combination with MYC network activation
Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains
Attention-deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder with a major genetic component. Here, we present a genome-wide association study meta-analysis of ADHD comprising 38,691 individuals with ADHD and 186,843 controls. We identified 27 genome-wide significant loci, highlighting 76 potential risk genes enriched among genes expressed particularly in early brain development. Overall, ADHD genetic risk was associated with several brain-specific neuronal subtypes and midbrain dopaminergic neurons. In exome-sequencing data from 17,896 individuals, we identified an increased load of rare protein-truncating variants in ADHD for a set of risk genes enriched with probable causal common variants, potentially implicating SORCS3 in ADHD by both common and rare variants. Bivariate Gaussian mixture modeling estimated that 84â98% of ADHD-influencing variants are shared with other psychiatric disorders. In addition, common-variant ADHD risk was associated with impaired complex cognition such as verbal reasoning and a range of executive functions, including attention
- âŠ