6 research outputs found

    Spatial rank-based multifactor dimensionality reduction to detect geneโ€“gene interactions for multivariate phenotypes

    Get PDF
    Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting geneโ€“gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotellings T2 statistic to evaluate interaction models, but it is well known that Hotellings T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (2013M3A9C4078158, NRF-2021R1A2C1007788)

    ๊ณ ์ฐจ์› ์œ ์ „์ฒด ์ž๋ฃŒ์—์„œ์˜ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ์ž‘์šฉ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2015. 2. ๋ฐ•ํƒœ์„ฑ.With the development of high-throughput genotyping and sequencing technology, there are growing evidences of association with genetic variants and common complex traits. In spite of thousands of genetic variants discovered, such genetic markers have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. Gene-gene interaction (GGI) analysis and rare variant analysis is expected to unveil a large portion of unexplained heritability of complex traits. In GGI, there are several practical issues. First, in order to conduct GGI analysis with high-dimensional genomic data, GGI methods requires the efficient computation and high accuracy. Second, it is hard to detect GGI for rare variants due to its sparsity. Third, analysing GGI using genome-wide scale suffers from a computational burden as exploring a huge search space. It requires much greater number of tests to find optimal GGI. For k variants, we have k(k-1)/2 combinations for two-order interactions, and nCk combinations for n-order interactions. The number of possible interaction models increase exponentially as the interaction order increases or the number of variant increases. Forth, though the biological interpretation of GGI is important, it is hard to interpret GGI due to its complex manner. In order to overcome these four main issues in GGI analysis with high-dimensional genomic data, the four novel methods are proposed. First, to provide efficient GGI method, we propose IGENT, Information theory-based GEnome-wide gene-gene iNTeraction method. IGENT is an efficient algorithm for identifying genome-wide GGI and gene-environment interaction (GEI). For detecting significant GGIs in genome-wide scale, it is important to reduce computational burden significantly. IGENT uses information gain (IG) and evaluates its significance without resampling. Through our simulation studies, the power of the IGENT is shown to be better than or equivalent to that of that of BOOST. The proposed method successfully detected GGI for bipolar disorder in the Wellcome Trust Case Control Consortium (WTCCC) and age-related macular degeneration (AMD). Second, for GGI analysis of rare variants, we propose a new gene-gene interaction method in the framework of the multifactor dimensionality reduction (MDR) analysis. The proposed method consists of two steps. The first step is to collapse the rare variants in a specific region such as gene. The second step is to perform MDR analysis for the collapsed rare variants. The proposed method is applied in whole exome sequencing data of Korean population to identify causal gene-gene interaction for rare variants for type 2 diabetes (T2D). Third, to increase computational performance for GGI in genome-wide scale, we developed CUDA (Compute Unified Device Architecture) based genome-wide association MDR (cuGWAM) software using efficient hardware accelerators. cuGWAM has better performance than CPU-based MDR methods and other GPU-based methods through our simulation studies. Fourth, to efficiently provide the statistical interpretation and biological evidences of gene-gene interactions, we developed the VisEpis, a tool for visualizing of gene-gene interactions in genetic association analysis and mapping of epistatic interaction to the biological evidence from public interaction databases. Using interaction network and circular plot, the VisEpis provides to explore the interaction network integrated with biological evidences in epigenetic regulation, splicing, transcription, translation and post-translation level. To aid statistical interaction in genotype level, the VisEpis provides checkerboard, pairwise checkerboard, forest, funnel and ring chart.Abstract i Contents iv List of Figures viii List of Tables xi 1 Introduction 1 1.1 Background of high-dimensional genomic data 1 1.1.1 History of genome-wide association studies (GWAS) 1 1.1.2 Association studies of massively parallel sequencing (MPS) 3 1.1.3 Missing heritability and proposed alternative methods 6 1.2 Purpose and novelty of this study 7 1.3 Outline of the thesis 8 2 Overview of gene-gene interaction 9 2.1 Definition of gene-gene interaction 9 2.2 Practical issues of gene-gene interaction 12 2.3 Overview of gene-gene interaction methods 14 2.3.1 Regression-based gene-gene interaction methods 14 2.3.2 Multifactor dimensionality reduction (MDR) 15 2.3.3 Gene-gene interaction methods using machine learning methods 18 2.3.3 Entropy-based method gene-gene interaction methods 20 3 Entropy-based Gene-gene interaction 22 3.1 Introduction 22 3.2 Methods 23 3.2.1 Entropy-based gene-gene interaction analysis 23 3.2.2 Exhaustive searching approach and Stepwise selection approach 24 3.2.3 Simulation setting 27 3.2.4 Genome-wide data for Biopolar disorder (BD) 31 3.2.5 Genome-wide data for Age-related macular degeneration (AMD) 31 3.3 Results 33 3.3.1 Simulation results 33 3.3.2 Analysis of WTCCC bipolar disorder (BD) data 43 3.3.3 Analysis of age-related macular degeneration (AMD) data 44 3.4 Discussion 47 3.5 Conclusion 47 4 Gene-gene interaction for rare variants 48 4.1 Introduction 48 4.2 Methods 50 4.2.1 Collapsing-based gene-gene interaction 50 4.2.2 Simulation setting 50 4.3 Results 55 4.3.1 Simulation study 55 4.3.2 Real data analysis of the Type 2 diabetes data 55 4.4 Discussion and Conclusion 68 5 Computation enhancement for gene-gene interaction 5.1 Introduction 69 5.2 Methods 71 5.2.1 MDR implementation 71 5.2.2 Implementation using high-performance computation based on GPU 72 5.2.3 Environment of performance comparison 75 5.3 Results 76 5.3.1 Computational improvement 76 5.4 Discussion 84 5.5 Conclusion 87 6 Visualization for gene-gene interaction interpretation 88 6.1 Introduction 88 6.2 Methods 91 6.2.1 Interaction mapping procedure 91 6.2.1 Checker board plot 91 6.2.2 Forest and funnel plot 94 6.3 Case study 100 6.3.1 Interpretation of gene-gene interaction in WTCC bipolar disorder data 100 6.3.2 Interpretation of gene-gene interaction in Age-related macular degeneration (AMD) data 101 6.4 Conclusion 102 7 Summary and Conclusion 103 Bibliography 107 Abstract (Korean) 113Docto

    Examining interactions among SNPs that can explain the prognostic variability in colorectal cancer

    Get PDF
    Background: Colorectal cancer is a significant medical burden worldwide and in Newfoundland and Labrador. Examining the relationships of SNP interactions with survival outcomes can help identify new prognostic markers for this disease. Objectives: To examine associations between colorectal cancer survival outcomes and interactions of SNPs from MMP family and VEGF interactome genes using data-reduction methods. Methods: Two data-reduction software programs, Cox-MDR and GMDR 0.9, were applied to the data of patients from the Newfoundland Familial Colorectal Cancer Registry. Eight datasets were investigated: one for the MMP gene SNPs (201 SNPs), and seven for the VEGF interaction networks (total 1,517 SNPs). Significance of interaction models was assessed using permutation testing. Associations between significant interaction models and clinical outcomes were confirmed using multivariable regression methods. Results: For the MMP dataset two multi-SNP models and one single-SNP model were identified, while fifteen novel multi-SNP models and thirteen single-SNP models were identified for the VEGF interaction network datasets. All but one of these models were able to distinguish patients based on their outcome risk in multivariable regression models (p-value range: 0.03 โ€“ 2.2E-9). Conclusion: This research demonstrated that novel genetic interactions associated with outcome risk in colorectal cancer can be found using data-reduction methods. This proves the utility of these methods in prognostic research

    Gene-environment and gene-gene interactions in myopia

    Get PDF
    Motivated by the release of the UK Biobank data and the lack of documented gene-environment (GxE) and gene-gene (GxG) interactions in myopia, I sought to apply various statistical tools to provide a quantitative assessment of the interplay between environmental and genetic risk factors shaping refractive error. The comparison between the two different risk measurement scales with which GxE interactions can be identified suggested that the additive risk scale can lead to a more informative perspective about refractive error aetiology. The evaluation of two indirect methods for detecting genetic variants affecting refractive error via interaction effects suggested the enrichment of GxG and GxE among the variants that display marginal SNP effects. For genetic variants already known from prior GWAS studies to influence refractive error, genetic effect sizes were highly non-uniform; individuals from the tails of the refractive error distribution (i.e. high myopes and hyperopes) displayed much larger effects compared to individuals in the middle of the distribution (i.e. emmetropes). Prediction of refractive error using GxE interactions indicated that although some of the variance of refractive error could be explained by a risk score constructed using interaction effects, the contribution of GxE was already accounted for by a risk score constructed using marginal SNP effects only. Although a handful of candidate genes were identified using multifactor dimensionality reduction technique, none displayed compelling evidence of involvement in a GxG interaction. There was, however, suggestive evidence that the candidate genes constitute a genetic interaction network which is regulated by hub gene ZMAT4. In summary, the analyses reported in this thesis provide further support for the challenging nature of definitively identifying loci involved in GxE and GxG interactions. The thesis provides several guidelines that future studies could take into account to obtain more insightful results regarding the extent of interactions in refractive error
    corecore