thesis

Prediction Accuracy of SNP Epistasis Models Generated by Multifactor Dimensionality Reduction and Stepwise Penalized Logistic Regression

Abstract

Conventional statistical modeling techniques, used to detect high-order interactions between SNPs, lead to issues with high-dimensionality due to the number of interactions which need to be evaluated using sparse data. Statisticians have developed novel methods Multifactor Dimensionality Reduction (MDR), Generalized Multifactor Dimensionality Reduction (GMDR), and stepwise Penalized Logistic Regression (stepPLR) to analyze SNP epistasis associated with the development of or outcomes for genetic disease. Due to inconsistencies in published results regarding the performance of these three methods, this thesis used data from the very large GenIMS study to compare the prediction accuracies of 90-day mortality in SNP epistasis models. Comparisons were made using prediction accuracy, sensitivity, specificity, model consistency, chi-square tests, sign tests, and biological plausibility. Testing accuracies were generally higher for GMDR compared to MDR, and stepPLR yielded substandard performance since the models predicted that all subjects were alive at ninety days. Stepwise PLR, however, determined that IL-1A SNPs IL1A_M889, rs1894399, rs1878319, and rs2856837 were each significant predictors of 90-day mortality when adjusting for the other SNPs in the model. In addition, the model included a borderline significant, second-order interaction between rs28556838 and rs3783520 associated with 90-day mortality in a cohort of patients hospitalized with community-acquired pneumonia (CAP). The public health importance of this thesis is that the relative risk for CAP may be higher for a set of SNPs across different genes. The ability to predict which patients will experience a poor outcome may lead to more effective prevention strategies or treatments at earlier stages. Furthermore, identification of significant SNP interactions can also expand the scientific knowledge about biological mechanisms affecting disease outcomes. Altogether, the GMDR method yielded higher prediction accuracies than MDR, and MDR performed better than stepPLR when establishing SNP epistasis models associated with 90-day mortality in the GenIMS cohort

    Similar works