2,309 research outputs found
Spectral gene set enrichment (SGSE)
Motivation: Gene set testing is typically performed in a supervised context
to quantify the association between groups of genes and a clinical phenotype.
In many cases, however, a gene set-based interpretation of genomic data is
desired in the absence of a phenotype variable. Although methods exist for
unsupervised gene set testing, they predominantly compute enrichment relative
to clusters of the genomic variables with performance strongly dependent on the
clustering algorithm and number of clusters. Results: We propose a novel
method, spectral gene set enrichment (SGSE), for unsupervised competitive
testing of the association between gene sets and empirical data sources. SGSE
first computes the statistical association between gene sets and principal
components (PCs) using our principal component gene set enrichment (PCGSE)
method. The overall statistical association between each gene set and the
spectral structure of the data is then computed by combining the PC-level
p-values using the weighted Z-method with weights set to the PC variance scaled
by Tracey-Widom test p-values. Using simulated data, we show that the SGSE
algorithm can accurately recover spectral features from noisy data. To
illustrate the utility of our method on real data, we demonstrate the superior
performance of the SGSE method relative to standard cluster-based techniques
for testing the association between MSigDB gene sets and the variance structure
of microarray gene expression data. Availability:
http://cran.r-project.org/web/packages/PCGSE/index.html Contact:
[email protected] or [email protected]
Principal component gene set enrichment (PCGSE)
Motivation: Although principal component analysis (PCA) is widely used for
the dimensional reduction of biomedical data, interpretation of PCA results
remains daunting. Most existing methods attempt to explain each principal
component (PC) in terms of a small number of variables by generating
approximate PCs with few non-zero loadings. Although useful when just a few
variables dominate the population PCs, these methods are often inadequate for
characterizing the PCs of high-dimensional genomic data. For genomic data,
reproducible and biologically meaningful PC interpretation requires methods
based on the combined signal of functionally related sets of genes. While gene
set testing methods have been widely used in supervised settings to quantify
the association of groups of genes with clinical outcomes, these methods have
seen only limited application for testing the enrichment of gene sets relative
to sample PCs. Results: We describe a novel approach, principal component gene
set enrichment (PCGSE), for computing the statistical association between gene
sets and the PCs of genomic data. The PCGSE method performs a two-stage
competitive gene set test using the correlation between each gene and each PC
as the gene-level test statistic with flexible choice of both the gene set test
statistic and the method used to compute the null distribution of the gene set
statistic. Using simulated data with simulated gene sets and real gene
expression data with curated gene sets, we demonstrate that biologically
meaningful and computationally efficient results can be obtained from a simple
parametric version of the PCGSE method that performs a correlation-adjusted
two-sample t-test between the gene-level test statistics for gene set members
and genes not in the set. Availability:
http://cran.r-project.org/web/packages/PCGSE/index.html Contact:
[email protected] or [email protected]
- …