Search CORE

240 research outputs found

Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models

Author: A Subramanian
B Schölkopf
D Eisenberg
D Liu
D Zhang
Dawei Liu
Debashis Ghosh
G Kimeldorf
JJ Goeman
JJ Goeman
JJ Goeman
KD Dahlquist
M Raponi
N Breslow
P Grosu
P McCullagh
R Davies
R Davies
S Dhanasekaran
S le Cessie
SG Self
SW Doniger
V Vapnik
Xihong Lin
Z Wei
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Growing interest on biological pathways has called for new statistical methods for modeling and testing a genetic pathway effect on a health outcome. The fact that genes within a pathway tend to interact with each other and relate to the outcome in a complicated way makes nonparametric methods more desirable. The kernel machine method provides a convenient, powerful and unified method for multi-dimensional parametric and nonparametric modeling of the pathway effect. Results In this paper we propose a logistic kernel machine regression model for binary outcomes. This model relates the disease risk to covariates parametrically, and to genes within a genetic pathway parametrically or nonparametrically using kernel machines. The nonparametric genetic pathway effect allows for possible interactions among the genes within the same pathway and a complicated relationship of the genetic pathway and the outcome. We show that kernel machine estimation of the model components can be formulated using a logistic mixed model. Estimation hence can proceed within a mixed model framework using standard statistical software. A score test based on a Gaussian process approximation is developed to test for the genetic pathway effect. The methods are illustrated using a prostate cancer data set and evaluated using simulations. An extension to continuous and discrete outcomes using generalized kernel machine models and its connection with generalized linear mixed models is discussed. Conclusion Logistic kernel machine regression and its extension generalized kernel machine regression provide a novel and flexible statistical tool for modeling pathway effects on discrete and continuous outcomes. Their close connection to mixed models and attractive performance make them have promising wide applications in bioinformatics and other biomedical areas.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Collection Of Biostatistics Research Archive

Harvard Dataverse Network

A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype

Author: A Rosenwald
A Subramanian
AA Alizadeh
AJ Adewale
AL Boulesteix
AP Crijns
E Bair
H Binder
HK Dressman
I Dinu
J Gui
Jinheum Kim
JJ Goeman
JJ Goeman
JJ Goeman
K Jung
L Tian
Q Liu
R Tibshirani
Seungyeoun Lee
Sunho Lee
SY Kim
TR Golub
TS Furey
VK Mootha
X Chen
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performances of Gene Set Enrichment Analysis (GSEA), Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST) methods in a simulation study and on two ovarian cancer data sets. We considered two versions of GSEA by allowing different weights: GSEA1 uses equal weights, yielding results similar to the Kolmogorov-Smirnov test; while GSEA2's weights are based on the correlation between genes and the phenotype. Results We compared GSEA1, GSEA2, GT, WT and GBST in a simulation study with various settings for the correlation structure of the genes and the association parameter between the survival outcome and the genes. Simulation results indicated that GT, WT and GBST consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the power of the five tests depends on the combination of correlation structure and association parameter. For the ovarian cancer data set, using the FDR threshold of q Conclusion Simulation studies and a real data example indicate that GT, WT and GBST tend to have high power, whereas GSEA1 and GSEA2 have lower power. We also found that the power of the five tests is much higher when genes are correlated than when genes are independent, when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-class correlation and the association between survival and genes is positive or negative (i.e., one-direction correlation).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Testing the additional predictive value of high-dimensional molecular data

Author: AL Boulesteix
AL Boulesteix
Anne-Laure Boulesteix
C Truntzer
G Tutz
H Binder
H Höing
J Fridlyand
J Friedman
J Goeman
JJ Goeman
JJ Goeman
LJ van't Veer
M Schmidberger
O Gevaert
P Bühlmann
P Eden
R Tibshirani
R Tibshirani
S Chiaretti
T Golub
T Hothorn
T Hothorn
Torsten Hothorn
X Li
Y Freund
Y Sun
Publication venue: BioMed Central
Publication date: 01/09/2009
Field of study

While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature. We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e.g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to two publicly available cancer data sets. Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Microarray-based gene set analysis: a comparison of current methods

Author: A Nikitin
A Subramanian
G Smyth
GK Smyth
H Hotelling
H Jeong
I Dinu
J Goeman
J Rougemont
J Stuart
JC Gower
JJ Goeman
JJ Goeman
KD Dahlquist
L Tian
M Ashburner
M Kanehisa
Michael A Black
Q Liu
R Gentleman
R Gentleman
S Song
Sarah Song
SW Kong
TR Golub
U Mansmann
VG Tusher
VK Mootha
W Huber
WT Barry
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability and reproducibility of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, no consensus has yet been reached regarding which methodology performs best, and under what conditions. The goal of this work was to examine the performance characteristics of a collection of existing gene set analysis methods, on both simulated and real microarray data sets. Of particular interest was the potential utility gained through the incorporation of inter-gene correlation into the analysis process. RESULTS: Each of six gene set analysis methods was applied to both simulated and publicly available microarray data sets. Overall, the various methodologies were all found to be better at detecting gene sets that moved from non-active (i.e., genes not expressed) to active states (or vice versa), rather than those that simply changed their level of activity. Methods which incorporate correlation structures were found to provide increased ability to detect altered gene sets in some settings. CONCLUSION: Based on the results obtained through the analysis of simulated data, it is clear that the performance of gene set analysis methods is strongly influenced by the features of the data set in question, and that methods which incorporate correlation structures into the analysis process tend to achieve better performance, relative to methods which rely on univariate test statistics

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Similar gene expression profiles of sporadic, PGL2-, and SDHD-linked paragangliomas suggest a common pathway to tumorigenesis

Author: AG van der mey
AGL Vandermey
Andel GL Van der Mey
AP Gimenez-Roqueplo
BE Baysal
Cees J Cornelisse
Cor WRJ Cremers
D Astuti
D Astuti
D Astuti
DE Benn
EC Mariman
EE Lack
EF Hensen
Erik F Hensen
FM Vanbaars
GK Smyth
H Dannenberg
H Ogata
HP Neumann
Jan Oosting
Jelle J Goeman
JJ Goeman
JJ Goeman
JJ Goeman
JP Bayley
JP Bayley
K Hirota
KS Choi
MA Selak
P Pigny
Pancras CW Hogendoorn
PB Dekker
PB Dekker
PE Taschner
Peter Devilee
PJ Pollard
PJ Pollard
PL Dahia
PM Struycken
RC Gentleman
S Niemann
S Pounds
T Manoli
VK Mootha
WH Van Houtum
Y Benjamini
ZJ Wu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Contains fulltext : 81540.pdf (publisher's version ) (Open Access)BACKGROUND: Paragangliomas of the head and neck are highly vascular and usually clinically benign tumors arising in the paraganglia of the autonomic nervous system. A significant number of cases (10-50%) are proven to be familial. Multiple genes encoding subunits of the mitochondrial succinate-dehydrogenase (SDH) complex are associated with hereditary paraganglioma: SDHB, SDHC and SDHD. Furthermore, a hereditary paraganglioma family has been identified with linkage to the PGL2 locus on 11q13. No SDH genes are known to be located in the 11q13 region, and the exact gene defect has not yet been identified in this family. METHODS: We have performed a RNA expression microarray study in sporadic, SDHD- and PGL2-linked head and neck paragangliomas in order to identify potential differences in gene expression leading to tumorigenesis in these genetically defined paraganglioma subgroups. We have focused our analysis on pathways and functional gene-groups that are known to be associated with SDH function and paraganglioma tumorigenesis, i.e. metabolism, hypoxia, and angiogenesis related pathways. We also evaluated gene clusters of interest on chromosome 11 (i.e. the PGL2 locus on 11q13 and the imprinted region 11p15). RESULTS: We found remarkable similarity in overall gene expression profiles of SDHD -linked, PGL2-linked and sporadic paraganglioma. The supervised analysis on pathways implicated in PGL tumor formation also did not reveal significant differences in gene expression between these paraganglioma subgroups. Moreover, we were not able to detect differences in gene-expression of chromosome 11 regions of interest (i.e. 11q23, 11q13, 11p15). CONCLUSION: The similarity in gene-expression profiles suggests that PGL2, like SDHD, is involved in the functionality of the SDH complex, and that tumor formation in these subgroups involves the same pathways as in SDH linked paragangliomas. We were not able to clarify the exact identity of PGL2 on 11q13. The lack of differential gene-expression of chromosome 11 genes might indicate that chromosome 11 loss, as demonstrated in SDHD-linked paragangliomas, is an important feature in the formation of paragangliomas regardless of their genetic background.1 p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Radboud Repository

Outcome-related metabolomic patterns from 1H/31P NMR after mild hypothermia treatments of oxygen–glucose deprivation in a neonatal brain slice model of asphyxia

Author: Eriksson L
Goeman JJ
Hastie T
Hikari AI Yoshihara
Jia Liu
Lawrence Litt
Leist M
Mark JS Kelly
Mark R Segal
Thomas L James
Tibshirani R
Willker W
Publication venue: Nature Publishing Group
Publication date: 01/02/2011
Field of study

Human clinical trials using 72 hours of mild hypothermia (32°C–34°C) after neonatal asphyxia have found substantially improved neurologic outcomes. As temperature changes differently modulate numerous metabolite fluxes and concentrations, we hypothesized that 1H/31P nuclear magnetic resonance (NMR) spectroscopy of intracellular metabolites can distinguish different insults, treatments, and recovery stages. Three groups of superfused neonatal rat brain slices underwent 45 minutes oxygen–glucose deprivation (OGD) and then were: treated for 3 hours with mild hypothermia (32°C) that began with OGD, or similarly treated with hypothermia after a 15-minute delay, or not treated (normothermic control group, 37°C). Hypothermia was followed by 3 hours of normothermic recovery. Slices collected at different predetermined times were processed, respectively, for 14.1 Tesla NMR analysis, enzyme-linked immunosorbent assay (ELISA) cell-death quantification, and superoxide production. Forty-nine NMR-observable metabolites underwent a multivariate analysis. Separated clustering in scores plots was found for treatment and outcome groups. Final ATP (adenosine triphosphate) levels, severely decreased at normothermia, were restored equally by immediate and delayed hypothermia. Cell death was decreased by immediate hypothermia, but was equally substantially greater with normothermia and delayed hypothermia. Potentially important biomarkers in the 1H spectra included PCr-1H (phosphocreatine in the 1H spectrum), ATP-1H (adenosine triphosphate in the 1H spectrum), and ADP-1H (adenosine diphosphate in the 1H spectrum). The findings suggest a potential role for metabolomic monitoring during therapeutic hypothermia

Crossref

PubMed Central

eScholarship - University of California

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Author: B Efron
B Efron
CR Genovese
E Roquain
EA Peña
Edsel A. Peña
G Blanchard
G Blanchard
G Kang
H Finner
J Scott
J Storey
JD Habiger
JD Habiger
JJ Goeman
JL Doob
Joshua D. Habiger
K Roeder
M Bogdan
M Guindani
P Müller
PH Westfall
PH Westfall
S Dudoit
S Holm
SK Sarkar
SK Sarkar
SK Sarkar
W Hoeffding
W Sun
W Wu
Wensong Wu
Y Benjamini
Y Benjamini
Z Šidák
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2010
Field of study

This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery rate (MDR), could be found within these classes. Such multiple decision functions can be utilized in multiple testing, specifically, but not limited to, the analysis of high-dimensional microarray data sets.Comment: 19 page

arXiv.org e-Print Archive

Crossref

Use of the gamma method for self-contained gene-set analysis of SNP data

Author: A Subramanian
Ann M Moyer
AR Gallant
B Efron
BL Fridley
BL Fridley
Brooke L Fridley
DB Allison
DV Zaykin
DV Zaykin
Gregory D Jenkins
I Dinu
JJ Goeman
JJ Goeman
Joanna M Biernacka
K Wang
K Wang
K Yu
L Li
LA Hindorff
Liewei Wang
LS Chen
MC Whitlock
N Niu
O De la Cruz
P Holmans
P Scheet
RA Fisher
RC Elston
SA McCarroll
WJ Gauderman
Publication venue: Nature Publishing Group
Publication date
Field of study

Gene-set analysis (GSA) evaluates the overall evidence of association between a phenotype and all genotyped single nucleotide polymorphisms (SNPs) in a set of genes, as opposed to testing for association between a phenotype and each SNP individually. We propose using the Gamma Method (GM) to combine gene-level P-values for assessing the significance of GS association. We performed simulations to compare the GM with several other self-contained GSA strategies, including both one-step and two-step GSA approaches, in a variety of scenarios. We denote a ‘one-step' GSA approach to be one in which all SNPs in a GS are used to derive a test of GS association without consideration of gene-level effects, and a ‘two-step' approach to be one in which all genotyped SNPs in a gene are first used to evaluate association of the phenotype with all measured variation in the gene and then the gene-level tests of association are aggregated to assess the GS association with the phenotype. The simulations suggest that, overall, two-step methods provide higher power than one-step approaches and that combining gene-level P-values using the GM with a soft truncation threshold between 0.05 and 0.20 is a powerful approach for conducting GSA, relative to the competing approaches assessed. We also applied all of the considered GSA methods to data from a pharmacogenomic study of cisplatin, and obtained evidence suggesting that the glutathione metabolism GS is associated with cisplatin drug response

Crossref

PubMed Central

Investigating the effect of paralogs on microarray gene-set analysis

Abstract Background In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. Results We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene <url>http://www.cbio.uct.ac.za/indygene</url>, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. Conclusions The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.</p

Cape Town University OpenUCT

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Access to Research at National University of Ireland, Galway

Cross-laboratory evaluation of multiplex bead assays including independent common reference standards for immunological monitoring of observational and interventional human studies

Author: De Paus RA
Dockrell HM
Drittij AMFH
Goeman JJ
Ho MM
Joosten SA
Khatri B
McShane H
Ottenhoff THM
Smith SG
Van Meijgaarden KE
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

© 2018 van Meijgaarden et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Background Multiplex assays are increasingly applied to analyze multicomponent signatures of human immune responses, including the dynamics of cytokine and chemokine production, in observational as well as interventional studies following treatment or vaccination. However, relatively limited information is available on the performance of the different available multiplex kits, and comparative evaluations addressing this important issue are lacking. Study design To fill this knowledge gap we performed a technical comparison of multiplex bead assays from 4 manufacturers, each represented by 3 different lots, and with the assays performed by 3 different laboratories. To cross compare kits directly, spiked samples, biological samples and a newly made reference standard were included in all assays. Analyses were performed on 324 standard curves to allow for evaluation of the quality of the standard curves and the subsequent interpretation of biological specimens. Results Manufacturer was the factor which contributed most to the observed variation whereas variation in lots, laboratory or type of detection reagent contributed minimally. Inclusion of a common reference standard allowed us to overcome observed differences in cytokine and chemokine levels between manufacturers. Conclusions We strongly recommend using multiplex assays from the same manufacturer within a single study and across studies that are likely to compare results in a quantitative manner. Incorporation of common reference standards, and application of the same analysis method in assays can overcome many analytical biases and thus could bridge comparison of independent immune profiling (e.g. vaccine immunogenicity) studies. With these recommendations taken into account, the multiplex bead assays performed as described here are useful tools in capturing complex human immune-signatures in observational and interventional studies.FP7 EURIPRED (FP7-INFRA-2012 Grant Agreement No. 312661 to HMcS, HMD, THMO, MMH) and EC HORIZON2020 TBVAC2020 (Grant Agreement No. 643381EC to HMcS, HMD, THMO)

LSHTM Research Online

Directory of Open Access Journals

Oxford University Research Archive

Leiden University Scholary Publications

Brunel University Research Archive

FigShare

LSHTM Data Compass