23 research outputs found

    Dissecting genetic interactions in complex traits

    Get PDF
    Of central importance in the dissection of the components that govern complex traits is understanding the architecture of natural genetic variation. Genetic interaction, or epistasis, constitutes one aspect of this, but epistatic analysis has been largely avoided in genome wide association studies because of statistical and computational difficulties. This thesis explores both issues in the context of two-locus interactions. Initially, through simulation and deterministic calculations it was demonstrated that not only can epistasis maintain deleterious mutations at intermediate frequencies when under selection, but that it may also have a role in the maintenance of additive variance. Based on the epistatic patterns that are evolutionarily persistent, and the frequencies at which they are maintained, it was shown that exhaustive two dimensional search strategies are the most powerful approaches for uncovering both additive variance and the other genetic variance components that are co-precipitated. However, while these simulations demonstrate encouraging statistical benefits, two dimensional searches are often computationally prohibitive, particularly with the marker densities and sample sizes that are typical of genome wide association studies. To address this issue different software implementations were developed to parallelise the two dimensional triangular search grid across various types of high performance computing hardware. Of these, particularly effective was using the massively-multi-core architecture of consumer level graphics cards. While the performance will continue to improve as hardware improves, at the time of testing the speed was 2-3 orders of magnitude faster than CPU based software solutions that are in current use. Not only does this software enable epistatic scans to be performed routinely at minimal cost, but it is now feasible to empirically explore the false discovery rates introduced by the high dimensionality of multiple testing. Through permutation analysis it was shown that the significance threshold for epistatic searches is a function of both marker density and population sample size, and that because of the correlation structure that exists between tests the threshold estimates currently used are overly stringent. Although the relaxed threshold estimates constitute an improvement in the power of two dimensional searches, detection is still most likely limited to relatively large genetic effects. Through direct calculation it was shown that, in contrast to the additive case where the decay of estimated genetic variance was proportional to falling linkage disequilibrium between causal variants and observed markers, for epistasis this decay was exponential. One way to rescue poorly captured causal variants is to parameterise association tests using haplotypes rather than single markers. A novel statistical method that uses a regularised parameter selection procedure on two locus haplotypes was developed, and through extensive simulations it can be shown that it delivers a substantial gain in power over single marker based tests. Ultimately, this thesis seeks to demonstrate that many of the obstacles in epistatic analysis can be ameliorated, and with the current abundance of genomic data gathered by the scientific community direct search may be a viable method to qualify the importance of epistasis

    High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies.

    Get PDF
    Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.This research was partially funded by NHMRC grant 1033452 and was supported by a Victorian Life Sciences Computation Initiative (VLSCI) grant number 0126 on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government, Australia

    Scalable Data Analysis on MapReduce-based Systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Synthetic sickness or lethality points at candidate combination therapy targets in glioblastoma

    No full text
    Synthetic lethal interactions in cancer hold the potential for successful combined therapies, which would avoid the difficulties of single molecule-targeted treatment. Identification of interactions that are specific for human tumors is an open problem in cancer research. This work aims at deciphering synthetic sick or lethal interactions directly from somatic alteration, expression and survival data of cancer patients. To this end, we look for pairs of genes and their alterations or expression levels that are "avoided" by tumors and "beneficial" for patients. Thus, candidates for synthetic sickness or lethality (SSL) interaction are identified as such gene pairs whose combination of states is under-represented in the data. Our main methodological contribution is a quantitative score that allows ranking of the candidate SSL interactions according to evidence found in patient survival. Applying this analysis to glioblastoma data, we collect 1,956 synthetic sick or lethal partners for 85 abundantly altered genes, most of which show extensive copy number variation across the patient cohort. We rediscover and interpret known interaction between TP53 and PLK1, as well as provide insight into the mechanism behind EGFR interacting with AKT2, but not AKT1 nor AKT3. Cox model analysis determines 274 of identified interactions as having significant impact on overall survival in glioblastoma, which is more informative than a standard survival predictor based on patient's age

    On quantitative issues pertaining to the detection of epistatic genetic architectures

    Get PDF
    Converging empirical evidence portrays epistasis (i.e., gene-gene interaction) as a ubiquitous property of genetic architectures and protagonist in complex trait variability. While researchers employ sophisticated technologies to detect epistasis, the scarcity of robust instances of detection in human populations is striking. To evaluate the empirical issues pertaining to epistatic detection, we analytically characterize the statistical detection problem and elucidate two candidate explanations. The first examines whether population-level manifestations of epistasis arising in nature are small; consequently, for sample-sizes employed in research, the power delivered by detectors may be disadvantageously small. The second considers whether gene-environmental association generates bias in estimates of genotypic values diminishing the power of detection. By simulation study, we adjudicate the merits of both explanations and the power to detect epistasis under four digenic architectures. In agreement with both explanations, our findings implicate small epistatic effect-sizes and gene-environmental association as mechanisms that obscure the detection of epistasis
    corecore