85 research outputs found

    Lower-Order Effects Adjustment in Quantitative Traits Model-Based Multifactor Dimensionality Reduction

    Get PDF
    Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on “top” findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using “on-the-fly” lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis

    Genome-wide environmental interaction analysis using multidimensional data reduction principles to identify asthma pharmacogenetic loci in relation to corticosteroid therapy

    Full text link
    Genome-wide gene-environment (GxE) and gene-gene (GxG) interaction studies share a lot of challenges via the common genetic component they involve. GWEI studies may therefore benefit from the abundance of methodologies that are available in the context of genome-wide epistasis detection methods. One of these is Model-Based Multifactor Dimensionality Reduction (MB-MDR), which does not make any assumption about the genetic inheritance model. MB-MDR involves reducing a high-dimensional GxE space to GxE factor levels that either exhibit high or low or no evidence for their association to disease outcome. In contrast to logistic regression and random forests, MB-MDR can be used to detect GxE interactions in the absence of any main effects or when sample sizes are too small to be able to model all main and GxE interaction effects. In this ongoing study, we demonstrate the opportunities and challenges of MB-MDR for genome-wide GxE interaction analysis and analyzed the difference in prebronchodilator FEV1 following 8 weeks of inhaled corticosteroid therapy, for 565 pediatric Caucasian CAMP (ages 5-12) from the SHARE project

    Molecular Reclassification of Crohn's Disease by Cluster Analysis of Genetic Variants

    Get PDF
    Background Crohn's Disease (CD) has a heterogeneous presentation, and is typically classified according to extent and location of disease. The genetic susceptibility to CD is well known and genome-wide association scans (GWAS) and meta-analysis thereof have identified over 30 susceptibility loci. Except for the association between ileal CD and NOD2 mutations, efforts in trying to link CD genetics to clinical subphenotypes have not been very successful. We hypothesized that the large number of confirmed genetic variants enables (better) classification of CD patients. Methodology/Principal Findings To look for genetic-based subgroups, genotyping results of 46 SNPs identified from CD GWAS were analyzed by Latent Class Analysis (LCA) in CD patients and in healthy controls. Six genetic-based subgroups were identified in CD patients, which were significantly different from the five subgroups found in healthy controls. The identified CD-specific clusters are therefore likely to contribute to disease behavior. We then looked at whether we could relate the genetic-based subgroups to the currently used clinical parameters. Although modest differences in prevalence of disease location and behavior could be observed among the CD clusters, Random Forest analysis showed that patients could not be allocated to one of the 6 genetic-based subgroups based on the typically used clinical parameters alone. This points to a poor relationship between the genetic-based subgroups and the used clinical subphenotypes. Conclusions/Significance This approach serves as a first step to reclassify Crohn's disease. The used technique can be applied to other common complex diseases as well, and will help to complete patient characterization, in order to evolve towards personalized medicine. </sec

    FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals

    Get PDF
    We propose a novel multifactor dimensionality reduction method for epistasis detection in small or extended pedigrees, FAM-MDR. It combines features of the Genome-wide Rapid Association using Mixed Model And Regression approach (GRAMMAR) with Model-Based MDR (MB-MDR). We focus on continuous traits, although the method is general and can be used for outcomes of any type, including binary and censored traits. When comparing FAM-MDR with Pedigree-based Generalized MDR (PGMDR), which is a generalization of Multifactor Dimensionality Reduction (MDR) to continuous traits and related individuals, FAM-MDR was found to outperform PGMDR in terms of power, in most of the considered simulated scenarios. Additional simulations revealed that PGMDR does not appropriately deal with multiple testing and consequently gives rise to overly optimistic results. FAM-MDR adequately deals with multiple testing in epistasis screens and is in contrast rather conservative, by construction. Furthermore, simulations show that correcting for lower order (main) effects is of utmost importance when claiming epistasis. As Type 2 Diabetes Mellitus (T2DM) is a complex phenotype likely influenced by gene-gene interactions, we applied FAM-MDR to examine data on glucose area-under-the-curve (GAUC), an endophenotype of T2DM for which multiple independent genetic associations have been observed, in the Amish Family Diabetes Study (AFDS). This application reveals that FAM-MDR makes more efficient use of the available data than PGMDR and can deal with multi-generational pedigrees more easily. In conclusion, we have validated FAM-MDR and compared it to PGMDR, the current state-of-the-art MDR method for family data, using both simulations and a practical dataset. FAM-MDR is found to outperform PGMDR in that it handles the multiple testing issue more correctly, has increased power, and efficiently uses all available information

    Comparison of genetic association strategies in the presence of rare alleles

    Get PDF
    In the quest for the missing heritability of most complex diseases, rare variants have received increased attention. Advances in large-scale sequencing have led to a shift from the common disease/common variant hypothesis to the common disease/rare variant hypothesis or have at least reopened the debate about the relevance and importance of rare variants for gene discoveries. The investigation of modeling and testing approaches to identify significant disease/rare variant associations is in full motion. New methods to better deal with parameter estimation instabilities, convergence problems, or multiple testing corrections in the presence of rare variants or effect modifiers of rare variants are in their infancy. Using a recently developed semiparametric strategy to detect causal variants, we investigate the performance of the model-based multifactor dimensionality reduction (MB-MDR) technique in terms of power and family-wise error rate (FWER) control in the presence of rare variants, using population-based and family-based data (FAM-MDR). We compare family-based results obtained from MB-MDR analyses to screening findings from a quantitative trait Pedigree-based association test (PBAT). Population-based data were further examined using penalized regression models. We restrict attention to all available single-nucleotide polymorphisms on chromosome 4 and consider Q1 as the outcome of interest. The considered family-based methods identified marker C4S4935 in the VEGFC gene with estimated power not exceeding 0.35 (FAM-MDR), when FWER was kept under control. The considered population-based methods gave rise to highly inflated FWERs (up to 90% for PBAT screening)

    Genomic Association Screening Methodology for High-Dimensional and Complex Data Structures: Detecting n-Order Interactions

    Full text link
    We developed a data-mining method, Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect epistatic interactions for different types of traits. MB-MDR enables the fast identification of gene-gene interactions among 1000nds of SNPs, without the need to make restrictive assumptions about the genetic modes of inheritance. This thesis primarily focused on applying Model-Based Multifactor Dimensionality Reduction for quantitative traits, its performance and application to a variety of data problems. We carried out several simulation studies to evaluate quantitative MB-MDR in terms of power and type I error, when data are noisy, non-normal or skewed and when important main effects are present. Firstly, we assessed the performance of MB-MDR in the presence of noisy data. The error sources considered were missing genotypes, genotyping error, phenotypic mixtures and genetic heterogeneity. Results from this study showed that MB-MDR is least affected by presence of small percentages of missing data and genotyping errors but much affected in the presence of phenotypic mixtures and genetic heterogeneity. This is in line with a similar study performed for binary traits. Although both Multifactor Dimensionality Reduction (MDR) and MB-MDR are data reduction techniques with a common basis, their ways of deriving significant interactions are substantially different. Nevertheless, effects on power of introducing error sources were quite similar. Irrespective of the trait under consideration, epistasis screening methodologies such as MB-MDR and MDR mainly suffer from the presence of phenotypic mixtures and genetic heterogeneity. Secondly, we extensively addressed the issue of adjusting for lower-order genetic effects during epistasis screening, using different adjustment strategies for SNPs in the functional SNP-SNP interaction pair, and/or for additional important SNPs. Since, in this thesis, we restrict attention to 2-locus interactions only, adjustment for lower-order effects always (and only) implies adjustment for main genetic effects. Unfortunately most data dimensionality reduction techniques based on MDR do not explicitly require that lower-order effects are included in the ‘model’ when investigating higher-order effects (a prerequisite for most traditional, especially regression-based, methods). However, epistasis results may be hampered by the presence of significant lower-order effects. Results from this study showed hugely increased type I errors when main effects were not taken into account or were not properly accounted for. We observed that additive coding (the most commonly used coding in practice) in main effects adjustment does not remove all of the potential main effects that deviate from additive genetic variance. In addition, also adjusting for main effects prior to MB-MDR (via a regression framework), whatever coding is adopted, does not control type I error in all scenarios. From this study, we concluded that correction for lower-order effects should preferentially be done via codominant coding, to reduce the chance of false positive epistasis findings. The recommended way of performing an MB-MDR epistasis screening is to always adjust the analysis for lower-order effects of the SNPs under investigation, “on-the-fly”. This correction avoids overcorrection for other SNPs, which are not part of the interacting SNP pair under study. Thirdly, we assessed the cumulative effect of trait deviations from normality and homoscedasticity on the overall performance of quantitative MB-MDR to detect 2-locus epistasis signals in the absence of main effects. Although MB-MDR itself is a non-parametric method, in the sense that no assumptions are made regarding genetic modes of inheritance, the data reduction part in MB-MDR relies on association tests. In particular, for quantitative traits, the default MB-MDR way is to use the Student’s t-test (steps 1 and 2 of MB-MDR). Also when correcting for lower-order effects during quantitative MB-MDR analysis, we intrinsically maneuver within a regression framework. Since the Student’s t-statistic is the square root of the ANOVA F-statistic. Hence, along these lines, for MB-MDR to give valid results, ANOVA assumptions have to be met. Therefore, we simulated data from normal and non-normal distributions, with constant and non-constant variances, and performed association tests via the student’s t-test as well as the unequal variance t-test, commonly known as the Welch’s t-test. At first somewhat surprising, the results of this study showed that MB-MDR maintains adequate type I errors, irrespective of data distribution or association test used. On the other hand, MB-MDR give rise to lower power results for non-normal data compared to normal data. With respect to the association tests used within MB-MDR, in most cases, Welch’s t-test led to lower power compared to student’s t-test. To maintain the balance between power and type I error, we concluded that when performing MB-MDR analysis with quantitative traits, one ideally first rank-transforms traits to normality and then applies MB-MDR modeling with Student’s t-test as choice of association test. Clearly, before embarking on using a method in practice, there is a need to extensively check the applicability of the method to the data at hand. This is a common practice in biostatistics, but often a forgotten standard operating procedure in genetic epidemiology, in particular in GWAI studies. In addition to the presentation of extensive simulation studies, we also presented some MB-MDR applications to real-life data problems. These analyses involved MB-MDR analyses on quantitative as well as binary complex disease traits, primarily in the context of asthma/allergy and Crohn’s disease. In two of the presented analyses, MB-MDR confirmed logistic regression and transmission disequilibrium test (TDT) results. Part of the aforementioned methodological developments was initiated on the basis of observations of MB-MDR behavior on real-life data. Both the practical and theoretical components of this thesis confirm our belief in the potential of MB-MDR as a promising and versatile tool for the identification of epistatic effects, irrespective of the design (family-based or unrelated individuals) and irrespective of the targeted disease trait (binary, continuous, censored, categorical, multivariate). A thorough characterization of the different faces of MB-MDR this versatility gives rise to is work in progress
    corecore