246 research outputs found

    A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection

    Get PDF
    Background: Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology: Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student's t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student's t-test for association, as well as a novel MB-MDR implementation based on Welch's t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results: Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch's t-tests are generally lower than those for MB-MDR with Student's t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions: When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student's t-tests as internal tests for association

    A novel method to identify high order gene-gene interactions in genome-wide association studies: Gene-based MDR

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorphism (SNP) chips, the study of gene-gene interaction becomes a next challenge. Multifactor dimensionality reduction (MDR) analysis has been widely used for the gene-gene interaction analysis. In practice, however, it is not easy to perform high order gene-gene interaction analyses via MDR in genome-wide level because it requires exploring a huge search space and suffers from a computational burden due to high dimensionality.</p> <p>Results</p> <p>We propose dimensional reduction analysis, Gene-MDR analysis for the fast and efficient high order gene-gene interaction analysis. The proposed Gene-MDR method is composed of two-step applications of MDR: within- and between-gene MDR analyses. First, within-gene MDR analysis summarizes each gene effect via MDR analysis by combining multiple SNPs from the same gene. Second, between-gene MDR analysis then performs interaction analysis using the summarized gene effects from within-gene MDR analysis. We apply the Gene-MDR method to bipolar disorder (BD) GWA data from Wellcome Trust Case Control Consortium (WTCCC). The results demonstrate that Gene-MDR is capable of detecting high order gene-gene interactions associated with BD.</p> <p>Conclusion</p> <p>By reducing the dimension of genome-wide data from SNP level to gene level, Gene-MDR efficiently identifies high order gene-gene interactions. Therefore, Gene-MDR can provide the key to understand complex disease etiology.</p

    Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method

    Get PDF
    Although a large number of genetic variants have been identified to be associated with common diseases through genome-wide association studies, there still exits limitations in explaining the missing heritability. One approach to solving this missing heritability problem is to investigate gene-gene interactions, rather than a single-locus approach. For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely applied, since the constructive induction algorithm of MDR efficiently reduces high-order dimensions into one dimension by classifying multi-level genotypes into high- and low-risk groups. The MDR method has been extended to various phenotypes and has been improved to provide a significance test for gene-gene interactions. In this paper, we propose a simple method, called accelerated failure time (AFT) UM-MDR, in which the idea of a unified model-based MDR is extended to the survival phenotype by incorporating AFT-MDR into the classification step. The proposed AFT UM-MDR method is compared with AFT-MDR through simulation studies, and a short discussion is given

    EFMDR-Fast: An Application of Empirical Fuzzy Multifactor Dimensionality Reduction for Fast Execution

    Get PDF
    Gene-gene interaction is a key factor for explaining missing heritability. Many methods have been proposed to identify gene-gene interactions. Multifactor dimensionality reduction (MDR) is a well-known method for the detection of gene-gene interactions by reduction from genotypes of single-nucleotide polymorphism combinations to a binary variable with a value of high risk or low risk. This method has been widely expanded to own a specific objective. Among those expansions, fuzzy-MDR uses the fuzzy set theory for the membership of high risk or low risk and increases the detection rates of gene-gene interactions. Fuzzy-MDR is expanded by a maximum likelihood estimator as a new membership function in empirical fuzzy MDR (EFMDR). However, EFMDR is relatively slow, because it is implemented by R script language. Therefore, in this study, we implemented EFMDR using RCPP (c++ package) for faster executions. Our implementation for faster EFMDR, called EMMDR-Fast, is about 800 times faster than EFMDR written by R script only

    고차원 유전체 자료에서의 유전자-유전자 상호작용 분석

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2015. 2. 박태성.With the development of high-throughput genotyping and sequencing technology, there are growing evidences of association with genetic variants and common complex traits. In spite of thousands of genetic variants discovered, such genetic markers have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. Gene-gene interaction (GGI) analysis and rare variant analysis is expected to unveil a large portion of unexplained heritability of complex traits. In GGI, there are several practical issues. First, in order to conduct GGI analysis with high-dimensional genomic data, GGI methods requires the efficient computation and high accuracy. Second, it is hard to detect GGI for rare variants due to its sparsity. Third, analysing GGI using genome-wide scale suffers from a computational burden as exploring a huge search space. It requires much greater number of tests to find optimal GGI. For k variants, we have k(k-1)/2 combinations for two-order interactions, and nCk combinations for n-order interactions. The number of possible interaction models increase exponentially as the interaction order increases or the number of variant increases. Forth, though the biological interpretation of GGI is important, it is hard to interpret GGI due to its complex manner. In order to overcome these four main issues in GGI analysis with high-dimensional genomic data, the four novel methods are proposed. First, to provide efficient GGI method, we propose IGENT, Information theory-based GEnome-wide gene-gene iNTeraction method. IGENT is an efficient algorithm for identifying genome-wide GGI and gene-environment interaction (GEI). For detecting significant GGIs in genome-wide scale, it is important to reduce computational burden significantly. IGENT uses information gain (IG) and evaluates its significance without resampling. Through our simulation studies, the power of the IGENT is shown to be better than or equivalent to that of that of BOOST. The proposed method successfully detected GGI for bipolar disorder in the Wellcome Trust Case Control Consortium (WTCCC) and age-related macular degeneration (AMD). Second, for GGI analysis of rare variants, we propose a new gene-gene interaction method in the framework of the multifactor dimensionality reduction (MDR) analysis. The proposed method consists of two steps. The first step is to collapse the rare variants in a specific region such as gene. The second step is to perform MDR analysis for the collapsed rare variants. The proposed method is applied in whole exome sequencing data of Korean population to identify causal gene-gene interaction for rare variants for type 2 diabetes (T2D). Third, to increase computational performance for GGI in genome-wide scale, we developed CUDA (Compute Unified Device Architecture) based genome-wide association MDR (cuGWAM) software using efficient hardware accelerators. cuGWAM has better performance than CPU-based MDR methods and other GPU-based methods through our simulation studies. Fourth, to efficiently provide the statistical interpretation and biological evidences of gene-gene interactions, we developed the VisEpis, a tool for visualizing of gene-gene interactions in genetic association analysis and mapping of epistatic interaction to the biological evidence from public interaction databases. Using interaction network and circular plot, the VisEpis provides to explore the interaction network integrated with biological evidences in epigenetic regulation, splicing, transcription, translation and post-translation level. To aid statistical interaction in genotype level, the VisEpis provides checkerboard, pairwise checkerboard, forest, funnel and ring chart.Abstract i Contents iv List of Figures viii List of Tables xi 1 Introduction 1 1.1 Background of high-dimensional genomic data 1 1.1.1 History of genome-wide association studies (GWAS) 1 1.1.2 Association studies of massively parallel sequencing (MPS) 3 1.1.3 Missing heritability and proposed alternative methods 6 1.2 Purpose and novelty of this study 7 1.3 Outline of the thesis 8 2 Overview of gene-gene interaction 9 2.1 Definition of gene-gene interaction 9 2.2 Practical issues of gene-gene interaction 12 2.3 Overview of gene-gene interaction methods 14 2.3.1 Regression-based gene-gene interaction methods 14 2.3.2 Multifactor dimensionality reduction (MDR) 15 2.3.3 Gene-gene interaction methods using machine learning methods 18 2.3.3 Entropy-based method gene-gene interaction methods 20 3 Entropy-based Gene-gene interaction 22 3.1 Introduction 22 3.2 Methods 23 3.2.1 Entropy-based gene-gene interaction analysis 23 3.2.2 Exhaustive searching approach and Stepwise selection approach 24 3.2.3 Simulation setting 27 3.2.4 Genome-wide data for Biopolar disorder (BD) 31 3.2.5 Genome-wide data for Age-related macular degeneration (AMD) 31 3.3 Results 33 3.3.1 Simulation results 33 3.3.2 Analysis of WTCCC bipolar disorder (BD) data 43 3.3.3 Analysis of age-related macular degeneration (AMD) data 44 3.4 Discussion 47 3.5 Conclusion 47 4 Gene-gene interaction for rare variants 48 4.1 Introduction 48 4.2 Methods 50 4.2.1 Collapsing-based gene-gene interaction 50 4.2.2 Simulation setting 50 4.3 Results 55 4.3.1 Simulation study 55 4.3.2 Real data analysis of the Type 2 diabetes data 55 4.4 Discussion and Conclusion 68 5 Computation enhancement for gene-gene interaction 5.1 Introduction 69 5.2 Methods 71 5.2.1 MDR implementation 71 5.2.2 Implementation using high-performance computation based on GPU 72 5.2.3 Environment of performance comparison 75 5.3 Results 76 5.3.1 Computational improvement 76 5.4 Discussion 84 5.5 Conclusion 87 6 Visualization for gene-gene interaction interpretation 88 6.1 Introduction 88 6.2 Methods 91 6.2.1 Interaction mapping procedure 91 6.2.1 Checker board plot 91 6.2.2 Forest and funnel plot 94 6.3 Case study 100 6.3.1 Interpretation of gene-gene interaction in WTCC bipolar disorder data 100 6.3.2 Interpretation of gene-gene interaction in Age-related macular degeneration (AMD) data 101 6.4 Conclusion 102 7 Summary and Conclusion 103 Bibliography 107 Abstract (Korean) 113Docto

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Accelerating epistasis analysis in human genetics with consumer graphics hardware

    Get PDF
    BACKGROUND: Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. FINDINGS: We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs 2000whileobtainingsimilarperformanceonaBeowulfclusterrequires150CPUcoreswhich,includingtheaddedinfrastructureandsupportcostoftheclustersystem,costapproximately2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately 82,500. CONCLUSION: Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster
    corecore