1,149 research outputs found

    High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies.

    Get PDF
    Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.This research was partially funded by NHMRC grant 1033452 and was supported by a Victorian Life Sciences Computation Initiative (VLSCI) grant number 0126 on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government, Australia

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Design and Implementation of a Computational Platform and a Parallelized Interaction Analysis for Large Scale Genomics Data in Multiple Sclerosis

    Get PDF
    Abstract The multiple sclerosis (MS) genetics research group led by professor Jan Hillert at Karolinska Institutet, focuses on investigating the aetiology of the disease. Samples have been collected routinely from patients visiting the clinic for decades. From these samples, large amounts of genetics data is being generated. The traditional methods of analyzing the data is becoming increasingly inefficient as data sets grow larger. New approaches are needed to perform the analyses. This thesis gives an introduction to the relevant genetics and discusses possible approaches for enabling more efficient execution of legacy analysis tools, as well as improving a gene-environment and gene-gene interaction analysis. Different computational paradigms are presented followed by the implementation of a computational platform to support the researchers' existing, and possibly future, analysis needs. The improved interaction analysis application is then implemented and executed in a virtual instance of this platform. The performance of the analysis application is then evaluated with respect to the original reference application. Referat Design och implementation av berÀkningsplattform och paralelliserad interaktionsanalys för storskaliga genetiska data inom multipel skleros Professor Jan Hillert vid Karolinska Institutet leder en forskargrupp som fokuserar pÄ etiologin bakom multipel skleros (MS). Under flera Ärtionden har patientprover samlats in frÄn kliniken och frÄn dessa prover har stora mÀngder genetiska data genererats. De traditionella analysmetoderna blir allt mer ineffektiva dÄ datamÀngderna öker. Det finns ett stort behov av nya tillvÀgagÄngssÀtt och metoder för att analysera dessa data. Denna uppsats ger en introduktion i relevant genetik och diskuterar olika tillvÀgagÄngssÀtt för att möjliggöra effektivare exekvering av befintliga analysverktyg, sÄ vÀl som förbÀttring av en gen-miljö och gen-gen-interaktionsanalys. Olika etablerade berÀkningsparadigmer presenteras, följt av en implementation av en berÀkningsplattform som ett stöd i att tillgodose forskargruppens nuvarande och möjli-ga framtida behov. Den förbÀttrade interaktionsanalysen Àr sedan implementerad och exekverad i en virtuell instans av plattformen. Interaktionsanalysens prestanda utvÀrderas sedan och jÀmförs med ursprungsimplementationen
    • 

    corecore