Search CORE

281 research outputs found

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Author: Frédéric Farnir
Sinan Abo Alchamlat
Publication venue: Springer Nature
Publication date: 21/03/2017
Field of study

Background Finding epistatic interactions in large association studies like genome-wide association studies (GWAS) with the nowadays-available large volume of genomic data is a challenging and largely unsolved issue. Few previous studies could handle genome-wide data due to the intractable difficulties met in searching a combinatorial explosive search space and statistically evaluating epistatic interactions given a limited number of samples. Our work is a contribution to this field. We propose a novel approach combining K-Nearest Neighbors (KNN) and Multi Dimensional Reduction (MDR) methods for detecting gene-gene interactions as a possible alternative to existing algorithms, e especially in situations where the number of involved determinants is high. After describing the approach, a comparison of our method (KNN-MDR) to a set of the other most performing methods (i.e., MDR, BOOST, BHIT, MegaSNPHunter and AntEpiSeeker) is carried on to detect interactions using simulated data as well as real genome-wide data. Results Experimental results on both simulated data and real genome-wide data show that KNN-MDR has interesting properties in terms of accuracy and power, and that, in many cases, it significantly outperforms its recent competitors. Conclusions The presented methodology (KNN-MDR) is valuable in the context of loci and interactions mapping and can be seen as an interesting addition to the arsenal used in complex traits analyses

Crossref

Springer - Publisher Connector

PubMed Central

Open Repository and Bibliography - Liège

FigShare

Spatial rank-based multifactor dimensionality reduction to detect gene–gene interactions for multivariate phenotypes

Author: Jeong Hoe-Bin
Lee Jong-Hyun
Park Mira
Park Taesung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/10/2021
Field of study

Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotellings T2 statistic to evaluate interaction models, but it is well known that Hotellings T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (2013M3A9C4078158, NRF-2021R1A2C1007788)

SNU Open Repository and Archive

Identifying Gene-Gene Interactions that are Highly Associated with Body Mass Index Using Quantitative Multifactor Dimensionality Reduction (QMDR)

Author: De Rishika
Drenos Fotios
Holzinger Emily R.
Verma Shefali S.
Publication venue: Dartmouth Digital Commons
Publication date: 14/12/2015
Field of study

Despite heritability estimates of 40–70% for obesity, less than 2% of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. Using genotypic data from 18,686 individuals across five study cohorts – ARIC, CARDIA, FHS, CHS, MESA – we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait

Dartmouth Digital Commons (Dartmouth College)

고차원 유전체 자료에서의 유전자-유전자 상호작용 분석

Author: 권민석
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2015. 2. 박태성.With the development of high-throughput genotyping and sequencing technology, there are growing evidences of association with genetic variants and common complex traits. In spite of thousands of genetic variants discovered, such genetic markers have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. Gene-gene interaction (GGI) analysis and rare variant analysis is expected to unveil a large portion of unexplained heritability of complex traits. In GGI, there are several practical issues. First, in order to conduct GGI analysis with high-dimensional genomic data, GGI methods requires the efficient computation and high accuracy. Second, it is hard to detect GGI for rare variants due to its sparsity. Third, analysing GGI using genome-wide scale suffers from a computational burden as exploring a huge search space. It requires much greater number of tests to find optimal GGI. For k variants, we have k(k-1)/2 combinations for two-order interactions, and nCk combinations for n-order interactions. The number of possible interaction models increase exponentially as the interaction order increases or the number of variant increases. Forth, though the biological interpretation of GGI is important, it is hard to interpret GGI due to its complex manner. In order to overcome these four main issues in GGI analysis with high-dimensional genomic data, the four novel methods are proposed. First, to provide efficient GGI method, we propose IGENT, Information theory-based GEnome-wide gene-gene iNTeraction method. IGENT is an efficient algorithm for identifying genome-wide GGI and gene-environment interaction (GEI). For detecting significant GGIs in genome-wide scale, it is important to reduce computational burden significantly. IGENT uses information gain (IG) and evaluates its significance without resampling. Through our simulation studies, the power of the IGENT is shown to be better than or equivalent to that of that of BOOST. The proposed method successfully detected GGI for bipolar disorder in the Wellcome Trust Case Control Consortium (WTCCC) and age-related macular degeneration (AMD). Second, for GGI analysis of rare variants, we propose a new gene-gene interaction method in the framework of the multifactor dimensionality reduction (MDR) analysis. The proposed method consists of two steps. The first step is to collapse the rare variants in a specific region such as gene. The second step is to perform MDR analysis for the collapsed rare variants. The proposed method is applied in whole exome sequencing data of Korean population to identify causal gene-gene interaction for rare variants for type 2 diabetes (T2D). Third, to increase computational performance for GGI in genome-wide scale, we developed CUDA (Compute Unified Device Architecture) based genome-wide association MDR (cuGWAM) software using efficient hardware accelerators. cuGWAM has better performance than CPU-based MDR methods and other GPU-based methods through our simulation studies. Fourth, to efficiently provide the statistical interpretation and biological evidences of gene-gene interactions, we developed the VisEpis, a tool for visualizing of gene-gene interactions in genetic association analysis and mapping of epistatic interaction to the biological evidence from public interaction databases. Using interaction network and circular plot, the VisEpis provides to explore the interaction network integrated with biological evidences in epigenetic regulation, splicing, transcription, translation and post-translation level. To aid statistical interaction in genotype level, the VisEpis provides checkerboard, pairwise checkerboard, forest, funnel and ring chart.Abstract i Contents iv List of Figures viii List of Tables xi 1 Introduction 1 1.1 Background of high-dimensional genomic data 1 1.1.1 History of genome-wide association studies (GWAS) 1 1.1.2 Association studies of massively parallel sequencing (MPS) 3 1.1.3 Missing heritability and proposed alternative methods 6 1.2 Purpose and novelty of this study 7 1.3 Outline of the thesis 8 2 Overview of gene-gene interaction 9 2.1 Definition of gene-gene interaction 9 2.2 Practical issues of gene-gene interaction 12 2.3 Overview of gene-gene interaction methods 14 2.3.1 Regression-based gene-gene interaction methods 14 2.3.2 Multifactor dimensionality reduction (MDR) 15 2.3.3 Gene-gene interaction methods using machine learning methods 18 2.3.3 Entropy-based method gene-gene interaction methods 20 3 Entropy-based Gene-gene interaction 22 3.1 Introduction 22 3.2 Methods 23 3.2.1 Entropy-based gene-gene interaction analysis 23 3.2.2 Exhaustive searching approach and Stepwise selection approach 24 3.2.3 Simulation setting 27 3.2.4 Genome-wide data for Biopolar disorder (BD) 31 3.2.5 Genome-wide data for Age-related macular degeneration (AMD) 31 3.3 Results 33 3.3.1 Simulation results 33 3.3.2 Analysis of WTCCC bipolar disorder (BD) data 43 3.3.3 Analysis of age-related macular degeneration (AMD) data 44 3.4 Discussion 47 3.5 Conclusion 47 4 Gene-gene interaction for rare variants 48 4.1 Introduction 48 4.2 Methods 50 4.2.1 Collapsing-based gene-gene interaction 50 4.2.2 Simulation setting 50 4.3 Results 55 4.3.1 Simulation study 55 4.3.2 Real data analysis of the Type 2 diabetes data 55 4.4 Discussion and Conclusion 68 5 Computation enhancement for gene-gene interaction 5.1 Introduction 69 5.2 Methods 71 5.2.1 MDR implementation 71 5.2.2 Implementation using high-performance computation based on GPU 72 5.2.3 Environment of performance comparison 75 5.3 Results 76 5.3.1 Computational improvement 76 5.4 Discussion 84 5.5 Conclusion 87 6 Visualization for gene-gene interaction interpretation 88 6.1 Introduction 88 6.2 Methods 91 6.2.1 Interaction mapping procedure 91 6.2.1 Checker board plot 91 6.2.2 Forest and funnel plot 94 6.3 Case study 100 6.3.1 Interpretation of gene-gene interaction in WTCC bipolar disorder data 100 6.3.2 Interpretation of gene-gene interaction in Age-related macular degeneration (AMD) data 101 6.4 Conclusion 102 7 Summary and Conclusion 103 Bibliography 107 Abstract (Korean) 113Docto

SNU Open Repository and Archive

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Author: BN Howie
C Yang
CF Li
CL Koo
DD Boos
DF Schwarz
F Gunther
Frédéric Farnir
FV Lishout
G De los Campos
G Fang
H-J Ban
J Gui
J Marchini
J Millstein
J Shang
J Wang
JM Mahachie John
JM Ver Hoef
K Shchetynsky
L Chen
L Hua
M Aci
M Manceau
M Manuguerra
M Ritchie
M Ritchie
MC Wu
MD Ritchie
MG Usai
MY Park
P Hall
PM Visscher
R Collins
R Upstill-Goddard
S Prabhu
S Purcell
Sinan Abo Alchamlat
SJ Winham
SK Musani
T Cattaert
T Cattaert
TA Manolio
Wellcome Trust Case Control C
WH Wei
X Wan
X Wan
X Wu
X-Y Lou
Y Wang
YH Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

GenEpi: gene-based epistasis discovery using machine learning.

Author: Alzheimer’s Disease Neuroimaging Initiative
Chang Yu-Chuan
Chen Chien-Yu
Giacomini Kathleen M
Hong Ming-Yi
Hsieh Ping-Han
Oyang Yen-Jen
Tung Yi-An
Wu June-Tai
Yee Sook Wah
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future

eScholarship - University of California

Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

Author: Uppu Suneetha
Publication venue: Curtin University
Publication date: 01/01/2018
Field of study

In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

espace@Curtin

Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses

Author: Assawamakin Anunchai
Chaiyaratana Nachol
Limwongse Chanin
Piroonratana Theera
Sinsomros Saravudh
Wongseree Waranyu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions. Results The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global <it>p</it>-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in <it>PGM1 </it>(phosphoglucomutase 1), two intronic SNPs in <it>LMX1A </it>(LIM homeobox transcription factor 1, alpha), two intronic SNPs in <it>PARK2 </it>(Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in <it>GYS2 </it>(glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population. Conclusion An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Interaction Between Allelic Variations in Vitamin D Receptor and Retinoid X Receptor Genes on Metabolic Traits

Author: Andrews Peter
Berry Diane J
Cavadino Alana
Mangino Massimo
Moore Jason H
Vimaleswaran Karani S
Publication venue: Dartmouth Digital Commons
Publication date: 19/03/2014
Field of study

Low vitamin D status has been shown to be a risk factor for several metabolic traits such as obesity, diabetes and cardiovascular disease. The biological actions of 1, 25-dihydroxyvitamin D, are mediated through the vitamin D receptor (VDR), which heterodimerizes with retinoid X receptor, gamma (RXRG). Hence, we examined the potential interactions between the tagging polymorphisms in the VDR (22 tag SNPs) and RXRG (23 tag SNPs) genes on metabolic outcomes such as body mass index, waist circumference, waist-hip ratio (WHR), high- and low-density lipoprotein (LDL) cholesterols, serum triglycerides, systolic and diastolic blood pressures and glycated haemoglobin in the 1958 British Birth Cohort (1958BC, up to n = 5,231). We used Multifactor- dimensionality reduction (MDR) program as a non-parametric test to examine for potential interactions between the VDR and RXRG gene polymorphisms in the 1958BC. We used the data from Northern Finland Birth Cohort 1966 (NFBC66, up to n = 5,316) and Twins UK (up to n = 3,943) to replicate our initial findings from 1958BC

Dartmouth Digital Commons (Dartmouth College)