35 research outputs found

    An efficient and flexible software tool for genome-wide association interaction studies

    Full text link
    Humans are made up of approximately 3.2 billion base pairs, out of which about 62 million can vary from one individual to another. These particular base pairs are called single nucleotide polymorphisms (SNPs). It is well known that some particular combination of SNP values increase dramatically the risk of contracting certain type of disease, like Crohn's disease, Alzheimer, diabetes and cancer, just to name a few. However, there are still a lot of new discoveries to make and specialized software is required for this task. It has been shown that individual SNPs cannot account for much of the heritability on their own. Therefore, this PhD thesis is dedicated to interaction studies, the purpose of which is to identify pairs of SNPs and/or environmental factors that might regulate the susceptibility to the disease under investigation. Model-Based Multifactor Dimensionality Reduction (MB-MDR) is a powerful and flexible methodology to perform interaction analysis, while minimizing the amount of false discoveries. Before this thesis, the only available implementation was an R-package taking days to analyze a dataset composed of just hundred of SNPs. However, a typical dataset contains hundreds of thousands or millions of SNPs, even after data cleaning and quality control. The aim of this thesis is to write a software able to analyze such datasets within a few days with the MB-MDR methodology. In other words, the goal is to get 10^8 times faster than the R-package, while still remaining powerful, flexible and keeping the amount of false discoveries low. Several contributions were needed to reach this goal and are presented in this thesis. First, a new software was written from scratch in C++, in order to be able to optimize every single computation, instead of relying on too generic functions as was the case for the R-package. Second, the methodology itself was improved, irrespective of the programming language. Indeed, MB-MDR is based on the maxT algorithm (introduced by Westfall&Young in 1993) to assess significance of the results and it can be customized for interaction analysis. A first major contribution of this PhD work, called Van Lishout's implementation of maxT, was introduced in 2011. The parallel version of this algorithm enables to analyze a dataset composed of hundred thousands of SNPs within a few days. The most important contribution of this thesis, called the gammaMAXT algorithm, was introduced in 2014. The parallel version enables to analyze a dataset composed of one million SNPs within one day. In this thesis, we also propose a new viewpoint to handle population stratification and correct for covariates. Many simulated and real-life data analysis are provided, to highlight the flexibility of the software and its ability to find interesting results from a biological point of view. The latest version, called mbmdr-4.4.1.out, can be downloaded freely at http://www.statgen.ulg.ac.be with the corresponding documentation

    Lower-Order Effects Adjustment in Quantitative Traits Model-Based Multifactor Dimensionality Reduction

    Get PDF
    Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on “top” findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using “on-the-fly” lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis

    Genome-wide environmental interaction analysis using multidimensional data reduction principles to identify asthma pharmacogenetic loci in relation to corticosteroid therapy

    Full text link
    Genome-wide gene-environment (GxE) and gene-gene (GxG) interaction studies share a lot of challenges via the common genetic component they involve. GWEI studies may therefore benefit from the abundance of methodologies that are available in the context of genome-wide epistasis detection methods. One of these is Model-Based Multifactor Dimensionality Reduction (MB-MDR), which does not make any assumption about the genetic inheritance model. MB-MDR involves reducing a high-dimensional GxE space to GxE factor levels that either exhibit high or low or no evidence for their association to disease outcome. In contrast to logistic regression and random forests, MB-MDR can be used to detect GxE interactions in the absence of any main effects or when sample sizes are too small to be able to model all main and GxE interaction effects. In this ongoing study, we demonstrate the opportunities and challenges of MB-MDR for genome-wide GxE interaction analysis and analyzed the difference in prebronchodilator FEV1 following 8 weeks of inhaled corticosteroid therapy, for 565 pediatric Caucasian CAMP (ages 5-12) from the SHARE project

    Comparison of genetic association strategies in the presence of rare alleles

    Get PDF
    In the quest for the missing heritability of most complex diseases, rare variants have received increased attention. Advances in large-scale sequencing have led to a shift from the common disease/common variant hypothesis to the common disease/rare variant hypothesis or have at least reopened the debate about the relevance and importance of rare variants for gene discoveries. The investigation of modeling and testing approaches to identify significant disease/rare variant associations is in full motion. New methods to better deal with parameter estimation instabilities, convergence problems, or multiple testing corrections in the presence of rare variants or effect modifiers of rare variants are in their infancy. Using a recently developed semiparametric strategy to detect causal variants, we investigate the performance of the model-based multifactor dimensionality reduction (MB-MDR) technique in terms of power and family-wise error rate (FWER) control in the presence of rare variants, using population-based and family-based data (FAM-MDR). We compare family-based results obtained from MB-MDR analyses to screening findings from a quantitative trait Pedigree-based association test (PBAT). Population-based data were further examined using penalized regression models. We restrict attention to all available single-nucleotide polymorphisms on chromosome 4 and consider Q1 as the outcome of interest. The considered family-based methods identified marker C4S4935 in the VEGFC gene with estimated power not exceeding 0.35 (FAM-MDR), when FWER was kept under control. The considered population-based methods gave rise to highly inflated FWERs (up to 90% for PBAT screening)

    Single-player games: introduction to a new solving method

    Full text link
    In many games, the machine has become stronger than the best human players. Machines have already beaten the human World Champion in famous games like Checkers, Chess, Scrabble and Othello. However, mankind has not been humbled by chips in all games. The best human players are still stronger than computers in games like Go, Poker, Chinese Chess and Hex. In this thesis, we will focus on a new way to model single-player games in order to improve the performances of the machine. We will demonstrate this technique on the game of Sokoban

    Monte-Carlo Tree Search in Backgammon

    Full text link
    peer reviewedMonte-Carlo Tree Search is a new method which has been applied successfully to many games. However, it has never been tested on two-player perfect-information games with a chance factor. Backgam- mon is the reference game of this category. Today’s best Backgammon programs are based on reinforcement learning and are stronger than the best human players. These programs have played millions of offline games to learn to evaluate a position. Our approach consists rather in playing online simulated games to learn how to play correctly in the current position
    corecore