311 research outputs found
Does pathway analysis make it easier for common variants to tag rare ones?
Analyzing sequencing data is difficult because of the low frequency of rare variants, which may result in low power to detect associations. We consider pathway analysis to detect multiple common and rare variants jointly and to investigate whether analysis at the pathway level provides an alternative strategy for identifying susceptibility genes. Available pathway analysis methods for data from genome-wide association studies might not be efficient because these methods are designed to detect common variants. Here, we investigate the performance of several existing pathway analysis methods for sequencing data. In particular, we consider the global test, which does not consider linkage disequilibrium between the variants in a gene. We improve the performance of the global test by assigning larger weights to rare variants, as proposed in the weighted-sum approach. Our conclusion is that straightforward application of pathway analysis is not satisfactory; hence, when common and rare variants are jointly analyzed, larger weights should be assigned to rare variants
Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models
<p>Abstract</p> <p>Background</p> <p>Growing interest on biological pathways has called for new statistical methods for modeling and testing a genetic pathway effect on a health outcome. The fact that genes within a pathway tend to interact with each other and relate to the outcome in a complicated way makes nonparametric methods more desirable. The kernel machine method provides a convenient, powerful and unified method for multi-dimensional parametric and nonparametric modeling of the pathway effect.</p> <p>Results</p> <p>In this paper we propose a logistic kernel machine regression model for binary outcomes. This model relates the disease risk to covariates parametrically, and to genes within a genetic pathway parametrically or nonparametrically using kernel machines. The nonparametric genetic pathway effect allows for possible interactions among the genes within the same pathway and a complicated relationship of the genetic pathway and the outcome. We show that kernel machine estimation of the model components can be formulated using a logistic mixed model. Estimation hence can proceed within a mixed model framework using standard statistical software. A score test based on a Gaussian process approximation is developed to test for the genetic pathway effect. The methods are illustrated using a prostate cancer data set and evaluated using simulations. An extension to continuous and discrete outcomes using generalized kernel machine models and its connection with generalized linear mixed models is discussed.</p> <p>Conclusion</p> <p>Logistic kernel machine regression and its extension generalized kernel machine regression provide a novel and flexible statistical tool for modeling pathway effects on discrete and continuous outcomes. Their close connection to mixed models and attractive performance make them have promising wide applications in bioinformatics and other biomedical areas.</p
A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype
Abstract Background Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performances of Gene Set Enrichment Analysis (GSEA), Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST) methods in a simulation study and on two ovarian cancer data sets. We considered two versions of GSEA by allowing different weights: GSEA1 uses equal weights, yielding results similar to the Kolmogorov-Smirnov test; while GSEA2's weights are based on the correlation between genes and the phenotype. Results We compared GSEA1, GSEA2, GT, WT and GBST in a simulation study with various settings for the correlation structure of the genes and the association parameter between the survival outcome and the genes. Simulation results indicated that GT, WT and GBST consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the power of the five tests depends on the combination of correlation structure and association parameter. For the ovarian cancer data set, using the FDR threshold of q Conclusion Simulation studies and a real data example indicate that GT, WT and GBST tend to have high power, whereas GSEA1 and GSEA2 have lower power. We also found that the power of the five tests is much higher when genes are correlated than when genes are independent, when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-class correlation and the association between survival and genes is positive or negative (i.e., one-direction correlation).</p
Testing the additional predictive value of high-dimensional molecular data
While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature.
We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e.g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to two publicly available cancer data sets.
Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available
Microarray-based gene set analysis: a comparison of current methods
BACKGROUND: The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability and reproducibility of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, no consensus has yet been reached regarding which methodology performs best, and under what conditions. The goal of this work was to examine the performance characteristics of a collection of existing gene set analysis methods, on both simulated and real microarray data sets. Of particular interest was the potential utility gained through the incorporation of inter-gene correlation into the analysis process. RESULTS: Each of six gene set analysis methods was applied to both simulated and publicly available microarray data sets. Overall, the various methodologies were all found to be better at detecting gene sets that moved from non-active (i.e., genes not expressed) to active states (or vice versa), rather than those that simply changed their level of activity. Methods which incorporate correlation structures were found to provide increased ability to detect altered gene sets in some settings. CONCLUSION: Based on the results obtained through the analysis of simulated data, it is clear that the performance of gene set analysis methods is strongly influenced by the features of the data set in question, and that methods which incorporate correlation structures into the analysis process tend to achieve better performance, relative to methods which rely on univariate test statistics
Globaltest and GOEAST: two different approaches for Gene Ontology analysis
Background Gene set analysis is a commonly used method for analysing microarray data by considering groups of functionally related genes instead of individual genes. Here we present the use of two gene set analysis approaches: Globaltest and GOEAST. Globaltest is a method for testing whether sets of genes are significantly associated with a variable of interest. GOEAST is a freely accessible web-based tool to test GO term enrichment within given gene sets. The two approaches were applied in the analysis of gene lists obtained from three different contrasts in a microarray experiment conducted to study the host reactions in broilers following Eimeria infection. Results The Globaltest identified significantly associated gene sets in one of the three contrasts made in the microarray experiment whereas the functional analysis of the differentially expressed genes using GOEAST revealed enriched GO terms in all three contrasts. Conclusion Globaltest and GOEAST gave different results, probably due to the different algorithms and the different criteria used for evaluating the significance of GO terms
Similar gene expression profiles of sporadic, PGL2-, and SDHD-linked paragangliomas suggest a common pathway to tumorigenesis
Contains fulltext :
81540.pdf (publisher's version ) (Open Access)BACKGROUND: Paragangliomas of the head and neck are highly vascular and usually clinically benign tumors arising in the paraganglia of the autonomic nervous system. A significant number of cases (10-50%) are proven to be familial. Multiple genes encoding subunits of the mitochondrial succinate-dehydrogenase (SDH) complex are associated with hereditary paraganglioma: SDHB, SDHC and SDHD. Furthermore, a hereditary paraganglioma family has been identified with linkage to the PGL2 locus on 11q13. No SDH genes are known to be located in the 11q13 region, and the exact gene defect has not yet been identified in this family. METHODS: We have performed a RNA expression microarray study in sporadic, SDHD- and PGL2-linked head and neck paragangliomas in order to identify potential differences in gene expression leading to tumorigenesis in these genetically defined paraganglioma subgroups. We have focused our analysis on pathways and functional gene-groups that are known to be associated with SDH function and paraganglioma tumorigenesis, i.e. metabolism, hypoxia, and angiogenesis related pathways. We also evaluated gene clusters of interest on chromosome 11 (i.e. the PGL2 locus on 11q13 and the imprinted region 11p15). RESULTS: We found remarkable similarity in overall gene expression profiles of SDHD -linked, PGL2-linked and sporadic paraganglioma. The supervised analysis on pathways implicated in PGL tumor formation also did not reveal significant differences in gene expression between these paraganglioma subgroups. Moreover, we were not able to detect differences in gene-expression of chromosome 11 regions of interest (i.e. 11q23, 11q13, 11p15). CONCLUSION: The similarity in gene-expression profiles suggests that PGL2, like SDHD, is involved in the functionality of the SDH complex, and that tumor formation in these subgroups involves the same pathways as in SDH linked paragangliomas. We were not able to clarify the exact identity of PGL2 on 11q13. The lack of differential gene-expression of chromosome 11 genes might indicate that chromosome 11 loss, as demonstrated in SDHD-linked paragangliomas, is an important feature in the formation of paragangliomas regardless of their genetic background.1 p
Outcome-related metabolomic patterns from 1H/31P NMR after mild hypothermia treatments of oxygen–glucose deprivation in a neonatal brain slice model of asphyxia
Human clinical trials using 72 hours of mild hypothermia (32°C–34°C) after neonatal asphyxia have found substantially improved neurologic outcomes. As temperature changes differently modulate numerous metabolite fluxes and concentrations, we hypothesized that 1H/31P nuclear magnetic resonance (NMR) spectroscopy of intracellular metabolites can distinguish different insults, treatments, and recovery stages. Three groups of superfused neonatal rat brain slices underwent 45 minutes oxygen–glucose deprivation (OGD) and then were: treated for 3 hours with mild hypothermia (32°C) that began with OGD, or similarly treated with hypothermia after a 15-minute delay, or not treated (normothermic control group, 37°C). Hypothermia was followed by 3 hours of normothermic recovery. Slices collected at different predetermined times were processed, respectively, for 14.1 Tesla NMR analysis, enzyme-linked immunosorbent assay (ELISA) cell-death quantification, and superoxide production. Forty-nine NMR-observable metabolites underwent a multivariate analysis. Separated clustering in scores plots was found for treatment and outcome groups. Final ATP (adenosine triphosphate) levels, severely decreased at normothermia, were restored equally by immediate and delayed hypothermia. Cell death was decreased by immediate hypothermia, but was equally substantially greater with normothermia and delayed hypothermia. Potentially important biomarkers in the 1H spectra included PCr-1H (phosphocreatine in the 1H spectrum), ATP-1H (adenosine triphosphate in the 1H spectrum), and ADP-1H (adenosine diphosphate in the 1H spectrum). The findings suggest a potential role for metabolomic monitoring during therapeutic hypothermia
A pathway-based association analysis model using common and rare variants
How various genetic effects in combination affect susceptibility to certain disease states continues to be a major area of methodological research. Various rare variant models have been proposed, in response to a common failure to either identify or validate biologically driven causal genetic variants in genome-wide association studies. Adopting the idea that multiple rare variants may effectively produce a combined effect equal to a single common variant effect through common linkage with this variant, we construct a pathway-based genetic association analysis model using both common and rare variants. This genetic model is applied to the disease status of unrelated individuals in replication 1 from Genetic Analysis Workshop 17. In this simulated example, we were able to identify several pathways that were potentially associated with the disease status and found that common variants showed stronger genetic effect than rare variants
Classes of Multiple Decision Functions Strongly Controlling FWER and FDR
This paper provides two general classes of multiple decision functions where
each member of the first class strongly controls the family-wise error rate
(FWER), while each member of the second class strongly controls the false
discovery rate (FDR). These classes offer the possibility that an optimal
multiple decision function with respect to a pre-specified criterion, such as
the missed discovery rate (MDR), could be found within these classes. Such
multiple decision functions can be utilized in multiple testing, specifically,
but not limited to, the analysis of high-dimensional microarray data sets.Comment: 19 page
- …