222 research outputs found
An R package implementation of multifactor dimensionality reduction
<p>Abstract</p> <p>Background</p> <p>A breadth of high-dimensional data is now available with unprecedented numbers of genetic markers and data-mining approaches to variable selection are increasingly being utilized to uncover associations, including potential gene-gene and gene-environment interactions. One of the most commonly used data-mining methods for case-control data is Multifactor Dimensionality Reduction (MDR), which has displayed success in both simulations and real data applications. Additional software applications in alternative programming languages can improve the availability and usefulness of the method for a broader range of users.</p> <p>Results</p> <p>We introduce a package for the R statistical language to implement the Multifactor Dimensionality Reduction (MDR) method for nonparametric variable selection of interactions. This package is designed to provide an alternative implementation for R users, with great flexibility and utility for both data analysis and research. The 'MDR' package is freely available online at <url>http://www.r-project.org/</url>. We also provide data examples to illustrate the use and functionality of the package.</p> <p>Conclusions</p> <p>MDR is a frequently-used data-mining method to identify potential gene-gene interactions, and alternative implementations will further increase this usage. We introduce a flexible software package for R users.</p
A comparison of internal validation techniques for multifactor dimensionality reduction
<p>Abstract</p> <p>Background</p> <p>It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.</p> <p>Results</p> <p>MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model.</p> <p>Conclusions</p> <p>Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.</p
Methylation of Leukocyte DNA and Ovarian Cancer: Relationships with Disease Status and Outcome
Genome-wide interrogation of DNA methylation (DNAm) in blood-derived leukocytes has become feasible with the advent of CpG genotyping arrays. In epithelial ovarian cancer (EOC), one report found substantial DNAm differences between cases and controls; however, many of these disease-associated CpGs were attributed to differences in white blood cell type distributions. We examined blood-based DNAm in 336 EOC cases and 398 controls; we included only high-quality CpG loci that did not show evidence of association with white blood cell type distributions to evaluate association with case status and overall survival
A Targeted Genetic Association Study of Epithelial Ovarian Cancer Susceptibility
BACKGROUND:
Genome-wide association studies have identified several common susceptibility alleles for epithelial ovarian cancer (EOC). To further understand EOC susceptibility, we examined previously ungenotyped candidate variants, including uncommon variants and those residing within known susceptibility loci. RESULTS:
At nine of eleven previously published EOC susceptibility regions (2q31, 3q25, 5p15, 8q21, 8q24, 10p12, 17q12, 17q21.31, and 19p13), novel variants were identified that were more strongly associated with risk than previously reported variants. Beyond known susceptibility regions, no variants were found to be associated with EOC risk at genome-wide statistical significance (p \u3c5x10(-8)), nor were any significant after Bonferroni correction for 17,000 variants (p\u3c 3x10-6). METHODS:
A customized genotyping array was used to assess over 17,000 variants in coding, non-coding, regulatory, and known susceptibility regions in 4,973 EOC cases and 5,640 controls from 13 independent studies. Susceptibility for EOC overall and for select histotypes was evaluated using logistic regression adjusted for age, study site, and population substructure. CONCLUSION:
Given the novel variants identified within the 2q31, 3q25, 5p15, 8q21, 8q24, 10p12, 17q12, 17q21.31, and 19p13 regions, larger follow-up genotyping studies, using imputation where necessary, are needed for fine-mapping and confirmation of low frequency variants that fall below statistical significance
Bipolar disorder with binge eating behavior: a genome-wide association study implicates PRR5-ARHGAP8
Bipolar disorder (BD) is associated with binge eating behavior (BE), and both conditions are heritable. Previously, using data from the Genetic Association Information Network (GAIN) study of BD, we performed genome-wide association (GWA) analyses of BD with BE comorbidity. Here, utilizing data from the Mayo Clinic BD Biobank (969 BD cases, 777 controls), we performed a GWA analysis of a BD subtype defined by BE, and case-only analysis comparing BD subjects with and without BE. We then performed a meta-analysis of the Mayo and GAIN results. The meta-analysis provided genome-wide significant evidence of association between single nucleotide polymorphisms (SNPs) in PRR5-ARHGAP8 and BE in BD cases (rs726170 OR=1.91, P=3.05E-08). In the meta-analysis comparing cases with BD with comorbid BE vs. non-BD controls, a genome-wide significant association was observed at SNP rs111940429 in an intergenic region near PPP1R2P5 (p=1.21E-08). PRR5-ARHGAP8 is a read-through transcript resulting in a fusion protein of PRR5 and ARHGAP8. PRR5 encodes a subunit of mTORC2, a serine/threonine kinase that participates in food intake regulation, while ARHGAP8 encodes a member of the RhoGAP family of proteins that mediate cross-talk between Rho GTPases and other signaling pathways. Without BE information in controls, it is not possible to determine whether the observed association reflects a risk factor for BE in general, risk for BE in individuals with BD, or risk of a subtype of BD with BE. The effect of PRR5-ARHGAP8 on BE risk thus warrants further investigation
CYP2C8*3 predicts benefit/risk profile in breast cancer patients receiving neoadjuvant paclitaxel
Paclitaxel is one of the most frequently used chemotherapeutic agents for the treatment of breast cancer patients. Using a candidate gene approach, we hypothesized that polymorphisms in genes relevant to the metabolism and transport of paclitaxel are associated with treatment efficacy and toxicity. Patient and tumor characteristics and treatment outcomes were collected prospectively for breast cancer patients treated with paclitaxel-containing regimens in the neoadjuvant setting. Treatment response was measured before and after each phase of treatment by clinical tumor measurement and categorized according to RECIST criteria, while toxicity data were collected from physician notes. The primary endpoint was achievement of clinical complete response (cCR) and secondary endpoints included clinical response rate (complete response + partial response) and grade 3+ peripheral neuropathy. The genotypes and haplotypes assessed were CYP1B1*3, CYP2C8*3, CYP3A4*1B/CYP3A5*3C, and ABCB1*2. A total of 111 patients were included in this study. Overall, cCR was 30.1 % to the paclitaxel component. CYP2C8*3 carriers (23/111, 20.7 %) had higher rates of cCR (55 % vs. 23 %; OR = 3.92 [95 % CI: 1.46–10.48], corrected p = 0.046). In the secondary toxicity analysis, we observed a trend toward greater risk of severe neuropathy (22 % vs. 8 %; OR = 3.13 [95 % CI: 0.89–11.01], uncorrected p = 0.075) in subjects carrying the CYP2C8*3 variant. Other polymorphisms interrogated were not significantly associated with response or toxicity. Patients carrying CYP2C8*3 are more likely to achieve clinical complete response from neoadjuvant paclitaxel treatment, but may also be at increased risk of experiencing severe peripheral neurotoxicity
Methylation Signature Implicated in Immuno-Suppressive Activities in Tubo-Ovarian High-Grade Serous Carcinoma
BACKGROUND: Better understanding of prognostic factors in tubo-ovarian high-grade serous carcinoma (HGSC) is critical, as diagnosis confers an aggressive disease course. Variation in tumor DNA methylation shows promise predicting outcome, yet prior studies were largely platform-specific and unable to evaluate multiple molecular features. METHODS: We analyzed genome-wide DNA methylation in 1,040 frozen HGSC, including 325 previously reported upon, seeking a multi-platform quantitative methylation signature that we evaluated in relation to clinical features, tumor characteristics, time to recurrence/death, extent of CD8+ tumor-infiltrating lymphocytes (TIL), gene expression molecular subtypes, and gene expression of the ATP-binding cassette transporter TAP1. RESULTS: Methylation signature was associated with shorter time to recurrence, independent of clinical factors (N = 715 new set, hazard ratio (HR), 1.65; 95% confidence interval (CI), 1.10-2.46; P = 0.015; N = 325 published set HR, 2.87; 95% CI, 2.17-3.81; P = 2.2 × 10-13) and remained prognostic after adjustment for gene expression molecular subtype and TAP1 expression (N = 599; HR, 2.22; 95% CI, 1.66-2.95; P = 4.1 × 10-8). Methylation signature was inversely related to CD8+ TIL levels (P = 2.4 × 10-7) and TAP1 expression (P = 0.0011) and was associated with gene expression molecular subtype (P = 5.9 × 10-4) in covariate-adjusted analysis. CONCLUSIONS: Multi-center analysis identified a novel quantitative tumor methylation signature of HGSC applicable to numerous commercially available platforms indicative of shorter time to recurrence/death, adjusting for other factors. Along with immune cell composition analysis, these results suggest a role for DNA methylation in the immunosuppressive microenvironment. IMPACT: This work aids in identification of targetable epigenome processes and stratification of patients for whom tailored treatment may be most beneficial
DNA Methylation Profiles of Ovarian Clear Cell Carcinoma
BACKGROUND: Ovarian clear cell carcinoma (OCCC) is a rare ovarian cancer histotype that tends to be resistant to standard platinum-based chemotherapeutics. We sought to better understand the role of DNA methylation in clinical and biological subclassification of OCCC. METHODS: We interrogated genome-wide methylation using DNA from fresh frozen tumors from 271 cases, applied non-smooth non-negative matrix factorization (nsNMF) clustering, and evaluated clinical associations and biological pathways. RESULTS: Two approximately equally sized clusters that associated with several clinical features were identified. Compared to Cluster 2 (N=137), Cluster 1 cases (N=134) presented at a more advanced stage, were less likely to be of Asian ancestry, and tended to have poorer outcomes including macroscopic residual disease following primary debulking surgery (p-values <0.10). Subset analyses of targeted tumor sequencing and immunohistochemical data revealed that Cluster 1 tumors showed TP53 mutation and abnormal p53 expression, and Cluster 2 tumors showed aneuploidy and ARID1A/PIK3CA mutation (p-values <0.05). Cluster-defining CpGs included 1,388 CpGs residing within 200 bp of the transcription start sites of 977 genes; 38% of these genes (N=369 genes) were differentially expressed across cluster in transcriptomic subset analysis (p-values <10(−4)). Differentially expressed genes were enriched for six immune-related pathways, including interferon alpha and gamma responses (p-values < 10(−6)). CONCLUSIONS: DNA methylation clusters in OCCC correlate with disease features and gene expression patterns among immune pathways. IMPACT: This work serves as a foundation for integrative analyses that better understand the complex biology of OCCC in an effort to improve potential for development of targeted therapeutics
Grammatical evolution decision trees for detecting gene-gene interactions
<p>Abstract</p> <p>Background</p> <p>A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.</p> <p>Methods</p> <p>Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions.</p> <p>Results</p> <p>The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects.</p> <p>Conclusions</p> <p>GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p
Genome-wide interaction analysis of menopausal hormone therapy use and breast cancer risk among 62,370 women
Use of menopausal hormone therapy (MHT) is associated with increased risk for breast cancer. However, the relevant mechanisms and its interaction with genetic variants are not fully understood. We conducted a genome-wide interaction analysis between MHT use and genetic variants for breast cancer risk in 27,585 cases and 34,785 controls from 26 observational studies. All women were post-menopausal and of European ancestry. Multivariable logistic regression models were used to test for multiplicative interactions between genetic variants and current MHT use. We considered interaction p-values < 5 × 10–8 as genome-wide significant, and p-values < 1 × 10–5 as suggestive. Linkage disequilibrium (LD)-based clumping was performed to identify independent candidate variants. None of the 9.7 million genetic variants tested for interactions with MHT use reached genome-wide significance. Only 213 variants, representing 18 independent loci, had p-values < 1 × 105. The strongest evidence was found for rs4674019 (p-value = 2.27 × 10–7), which showed genome-wide significant interaction (p-value = 3.8 × 10–8) with current MHT use when analysis was restricted to population-based studies only. Limiting the analyses to combined estrogen–progesterone MHT use only or to estrogen receptor (ER) positive cases did not identify any genome-wide significant evidence of interactions. In this large genome-wide SNP-MHT interaction study of breast cancer, we found no strong support for common genetic variants modifying the effect of MHT on breast cancer risk. These results suggest that common genetic variation has limited impact on the observed MHT–breast cancer risk association
- …