Search CORE

31,113 research outputs found

Weighted set enrichment of gene expression data

Author
Publication venue: BioMed Central
Publication date: 23/10/2013
Field of study

Spectral gene set enrichment (SGSE)

Author: Frost H. Robert
Li Zhigang
Moore Jason H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2014
Field of study

Motivation: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. Results: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracey-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: [email protected] or [email protected]

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Pathway Distiller - multisource biological pathway consolidation

Author: Anguiano Zachry
Bishop Alexander J. R.
Chen Yidong
Dashnamoorthy Ravi
Doderer Mark S.
Suresh Uthra
Publication venue: eScholarship@UMassChan
Publication date: 26/10/2012
Field of study

BACKGROUND: One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. METHODS: After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments\u27 resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. RESULTS: We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. CONCLUSIONS: By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments

eScholarship@UMMS

Pathway Distiller - multisource biological pathway consolidation

Author: Doderer Mark S.
Anguiano Zachry
Suresh Uthra
Dashnamoorthy Ravi
Bishop Alexander J. R.
Chen Yidong
Publication venue: eScholarship@UMMS
Publication date: 01/01/2005
Field of study

Crossref

Springer - Publisher Connector

eScholarship@UMMS

Transcriptomic signatures of neuronal differentiation and their association with risk genes for autism spectrum and related neuropsychiatric disorders.

Author: Chiocchetti AG
Cocchi E
de la Torre-Ubieta L
Freitag CM
Fulda S
Geschwind DH
Haslinger D
Lindlar S
Rothämel T
Stein JL
Waltes R
Publication venue: eScholarship, University of California
Publication date: 01/08/2016
Field of study

Genes for autism spectrum disorders (ASDs) are also implicated in fragile X syndrome (FXS), intellectual disabilities (ID) or schizophrenia (SCZ), and converge on neuronal function and differentiation. The SH-SY5Y neuroblastoma cell line, the most widely used system to study neurodevelopment, is currently discussed for its applicability to model cortical development. We implemented an optimal neuronal differentiation protocol of this system and evaluated neurodevelopment at the transcriptomic level using the CoNTeXT framework, a machine-learning algorithm based on human post-mortem brain data estimating developmental stage and regional identity of transcriptomic signatures. Our improved model in contrast to currently used SH-SY5Y models does capture early neurodevelopmental processes with high fidelity. We applied regression modelling, dynamic time warping analysis, parallel independent component analysis and weighted gene co-expression network analysis to identify activated gene sets and networks. Finally, we tested and compared these sets for enrichment of risk genes for neuropsychiatric disorders. We confirm a significant overlap of genes implicated in ASD with FXS, ID and SCZ. However, counterintuitive to this observation, we report that risk genes affect pathways specific for each disorder during early neurodevelopment. Genes implicated in ASD, ID, FXS and SCZ were enriched among the positive regulators, but only ID-implicated genes were also negative regulators of neuronal differentiation. ASD and ID genes were involved in dendritic branching modules, but only ASD risk genes were implicated in histone modification or axonal guidance. Only ID genes were over-represented among cell cycle modules. We conclude that the underlying signatures are disorder-specific and that the shared genetic architecture results in overlaps across disorders such as ID in ASD. Thus, adding developmental network context to genetic analyses will aid differentiating the pathophysiology of neuropsychiatric disorders

PubMed Central

eScholarship - University of California

Predicting glioblastoma prognosis networks using weighted gene co-expression network analysis on TCGA data

Author: A Eberharter
B Zhang
Cun-Quan Zhang
J Zhang
JE Phillips
Kun Huang
LJ van 't Veer
M Buyse
M Fuchs
M Newman
MA Pujana
MJ van de Vijver
P Langfelder
S Horvath
Y Ou
Yang Xiang
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Using gene co-expression analysis, researchers were able to predict clusters of genes with consistent functions that are relevant to cancer development and prognosis. We applied a weighted gene co-expression network (WGCN) analysis algorithm on glioblastoma multiforme (GBM) data obtained from the TCGA project and predicted a set of gene co-expression networks which are related to GBM prognosis. Methods We modified the Quasi-Clique Merger algorithm (QCM algorithm) into edge-covering Quasi-Clique Merger algorithm (eQCM) for mining weighted sub-network in WGCN. Each sub-network is considered a set of features to separate patients into two groups using K-means algorithm. Survival times of the two groups are compared using log-rank test and Kaplan-Meier curves. Simulations using random sets of genes are carried out to determine the thresholds for log-rank test p-values for network selection. Sub-networks with p-values less than their corresponding thresholds were further merged into clusters based on overlap ratios (>50%). The functions for each cluster are analyzed using gene ontology enrichment analysis. Results Using the eQCM algorithm, we identified 8,124 sub-networks in the WGCN, out of which 170 sub-networks show p-values less than their corresponding thresholds. They were then merged into 16 clusters. Conclusions We identified 16 gene clusters associated with GBM prognosis using the eQCM algorithm. Our results not only confirmed previous findings including the importance of cell cycle and immune response in GBM, but also suggested important epigenetic events in GBM development and prognosis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Research Repository @ WVU (West Virginia University)

Maximal information component analysis: a novel non-linear network analysis method.

Author: Bennett Brian
Lusis Aldons J
Orozco Luz D
Rau Christoph D
Weiss James
Wisniewski Nicholas
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

BackgroundNetwork construction and analysis algorithms provide scientists with the ability to sift through high-throughput biological outputs, such as transcription microarrays, for small groups of genes (modules) that are relevant for further research. Most of these algorithms ignore the important role of non-linear interactions in the data, and the ability for genes to operate in multiple functional groups at once, despite clear evidence for both of these phenomena in observed biological systems.ResultsWe have created a novel co-expression network analysis algorithm that incorporates both of these principles by combining the information-theoretic association measure of the maximal information coefficient (MIC) with an Interaction Component Model. We evaluate the performance of this approach on two datasets collected from a large panel of mice, one from macrophages and the other from liver by comparing the two measures based on a measure of module entropy, Gene Ontology (GO) enrichment, and scale-free topology (SFT) fit. Our algorithm outperforms a widely used co-expression analysis method, weighted gene co-expression network analysis (WGCNA), in the macrophage data, while returning comparable results in the liver dataset when using these criteria. We demonstrate that the macrophage data has more non-linear interactions than the liver dataset, which may explain the increased performance of our method, termed Maximal Information Component Analysis (MICA) in that case.ConclusionsIn making our network algorithm more accurately reflect known biological principles, we are able to generate modules with improved relevance, particularly in networks with confounding factors such as gene by environment interactions

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

eScholarship - University of California

Recommended from our members

Gene Expression Meta-Analysis Reveals Concordance in Gene Activation, Pathway, and Cell-Type Enrichment in Dermatomyositis Target Tissues.

Author: Kim Susan
Neely Jessica
Paranjpe Manish
Rychkov Dmitry
Sirota Marina
Waterfield Michael
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

ObjectiveWe conducted a comprehensive gene expression meta-analysis in dermatomyositis (DM) muscle and skin tissues to identify shared disease-relevant genes and pathways across tissues.MethodsSix publicly available data sets from DM muscle and two from skin were identified. Meta-analysis was performed by first processing data sets individually then cross-study normalization and merging creating tissue-specific gene expression matrices for subsequent analysis. Complementary single-gene and network analyses using Significance Analysis of Microarrays (SAM) and Weighted Gene Co-expression Network Analysis (WGCNA) were conducted to identify genes significantly associated with DM. Cell-type enrichment was performed using xCell.ResultsThere were 544 differentially expressed genes (FC ≥ 1.3, q < 0.05) in muscle and 300 in skin. There were 94 shared upregulated genes across tissues enriched in type I and II interferon (IFN) signaling and major histocompatibility complex (MHC) class I antigen-processing pathways. In a network analysis, we identified eight significant gene modules in muscle and seven in skin. The most highly correlated modules were enriched in pathways consistent with the single-gene analysis. Additional pathways uncovered by WGCNA included T-cell activation and T-cell receptor signaling. In the cell-type enrichment analysis, both tissues were highly enriched in activated dendritic cells and M1 macrophages.ConclusionThere is striking similarity in gene expression across DM target tissues with enrichment of type I and II IFN pathways, MHC class I antigen-processing, T-cell activation, and antigen-presenting cells. These results suggest IFN-γ may contribute to the global IFN signature in DM, and altered auto-antigen presentation through the class I MHC pathway may be important in disease pathogenesis

eScholarship - University of California

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Author: Clement Lieven
Dudoit Sandrine
Love Michael I
Perraudeau Fanny
Risso Davide
Robinson Mark D
Soneson Charlotte
Van den Berge Koen
Vert Jean-Philippe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq

Ghent University Academic Bibliography

Directory of Open Access Journals

Carolina Digital Repository

eScholarship - University of California

Archivsystem Ask23

ZORA

HAL-MINES ParisTech

Archivio istituzionale della ricerca - Università di Padova