1,149 research outputs found
High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies.
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.This research was partially funded by NHMRC grant 1033452 and was supported by a Victorian Life Sciences Computation Initiative (VLSCI) grant number 0126 on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government, Australia
Discovering Higher-order SNP Interactions in High-dimensional Genomic Data
In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise
Recommended from our members
Association Analysis of Additive Effects and Epistasis Between Human Candidate Malaria Protective Genes
Malaria is a major cause of childhood death in Africa and host genetic factors play a key role in determining survival from this disease. Although many candidate loci have been identified, there have been difficulties in confirming the significance of some of these loci. To some extent this might be explained by the added complexity of epistasis, or gene-gene interactions. Through this thesis I aimed: (1) to re-appraise a range of candidate malaria-association genes using a large-scale case-control study of severe malaria (SM) in Kilifi, Kenya; (2) to compare different approaches to detecting epistatic interactions; (3) to look for evidence of epistasis between candidate genes in my data set; (4) to examine the haplotype structure and linkage disequilibrium (LD) patterns for two such implicated variants (HbS and α+thalassaemia) and their gene regions, that coexist in the Kilifi population, and (5) to use these exemplars as a starting point for investigating the process of detecting epistasis in SM in a genome-wide association study (GWAS). Out of 71 candidate genes investigated, I observed that polymorphisms affecting various aspects of red blood cells (including HBB, HBA, G6PD, FREM3, INPP4B, ATP2B4 and ABO) were among those associated with the strongest signals of differential susceptibility to SM. Because of their prominence in malaria, HbS and α+thalassaemia were used to illustrate interaction analysis at the GWAS level. This included looking at the structure of the genomic regions surrounding the genes. As expected, a single haplotype of approximately 200kb was seen surrounding HbS, which then diverged into 2 major haplotypes spanning a further 1Mb either side, an observation that was largely explained by ethnicity. In contrast, no marked LD/haplotype structure was observed in the genomic region surrounding the α+thalassaemia deletion, suggesting that this is a very old polymorphism. Through this study, I confirmed the negative epistasis seen between HbS and α+thalassaemia using a study design (case-control) that was different to that used previously (cohort), although this was not among the most significant of the interactions I detected. I searched for pairwise interactions between these two polymorphisms at a genome wide level using heterozygous and additive models for HbS and α+thalassaemia respectively. For each scan a single region reaching a significance level of -7 was found (STX18 for HbS and MYEOV for α+thalassaemia), plus several other novel signals were identified in the 10-6 to 10-7 significance region. Further work will be required to validate these signals and the challenge will be to try and understand their biological relevance. This is now becoming possible with datasets in many diseases, including malaria, being released into the public domain. But, as this Kenyan study has shown, having large group sizes, high quality clinical and genetic data, it is possible to begin to explore genetic interactions in a disease setting
Design and Implementation of a Computational Platform and a Parallelized Interaction Analysis for Large Scale Genomics Data in Multiple Sclerosis
Abstract The multiple sclerosis (MS) genetics research group led by professor Jan Hillert at Karolinska Institutet, focuses on investigating the aetiology of the disease. Samples have been collected routinely from patients visiting the clinic for decades. From these samples, large amounts of genetics data is being generated. The traditional methods of analyzing the data is becoming increasingly inefficient as data sets grow larger. New approaches are needed to perform the analyses. This thesis gives an introduction to the relevant genetics and discusses possible approaches for enabling more efficient execution of legacy analysis tools, as well as improving a gene-environment and gene-gene interaction analysis. Different computational paradigms are presented followed by the implementation of a computational platform to support the researchers' existing, and possibly future, analysis needs. The improved interaction analysis application is then implemented and executed in a virtual instance of this platform. The performance of the analysis application is then evaluated with respect to the original reference application. Referat Design och implementation av berÀkningsplattform och paralelliserad interaktionsanalys för storskaliga genetiska data inom multipel skleros Professor Jan Hillert vid Karolinska Institutet leder en forskargrupp som fokuserar pÄ etiologin bakom multipel skleros (MS). Under flera Ärtionden har patientprover samlats in frÄn kliniken och frÄn dessa prover har stora mÀngder genetiska data genererats. De traditionella analysmetoderna blir allt mer ineffektiva dÄ datamÀngderna öker. Det finns ett stort behov av nya tillvÀgagÄngssÀtt och metoder för att analysera dessa data. Denna uppsats ger en introduktion i relevant genetik och diskuterar olika tillvÀgagÄngssÀtt för att möjliggöra effektivare exekvering av befintliga analysverktyg, sÄ vÀl som förbÀttring av en gen-miljö och gen-gen-interaktionsanalys. Olika etablerade berÀkningsparadigmer presenteras, följt av en implementation av en berÀkningsplattform som ett stöd i att tillgodose forskargruppens nuvarande och möjli-ga framtida behov. Den förbÀttrade interaktionsanalysen Àr sedan implementerad och exekverad i en virtuell instans av plattformen. Interaktionsanalysens prestanda utvÀrderas sedan och jÀmförs med ursprungsimplementationen
- âŠ