18 research outputs found

    Modifications to the Patient Rule-Induction Method that utilize non-additive combinations of genetic and environmental effects to define partitions that predict ischemic heart disease

    Full text link
    This article extends the Patient Rule-Induction Method (PRIM) for modeling cumulative incidence of disease developed by Dyson et al. (Genet Epidemiol 31:515–527) to include the simultaneous consideration of non-additive combinations of predictor variables, a significance test of each combination, an adjustment for multiple testing and a confidence interval for the estimate of the cumulative incidence of disease in each partition. We employ the partitioning algorithm component of the Combinatorial Partitioning Method to construct combinations of predictors, permutation testing to assess the significance of each combination, theoretical arguments for incorporating a multiple testing adjustment and bootstrap resampling to produce the confidence intervals. An illustration of this revised PRIM utilizing a sample of 2,258 European male participants from the Copenhagen City Heart Study is presented that assesses the utility of genetic variants in predicting the presence of ischemic heart disease beyond the established risk factors. Genet. Epidemiol . 2009. © 2008 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62158/1/20383_ftp.pd

    Multilocus analysis of SNP and metabolic data within a given pathway

    Get PDF
    BACKGROUND: Complex traits, which are under the influence of multiple and possibly interacting genes, have become a subject of new statistical methodological research. One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common multifactorial diseases and their association to different quantitative phenotypic traits. RESULTS: Two types of data from the same metabolic pathway were used in the analysis: categorical measurements of 18 SNPs; and quantitative measurements of plasma levels of several steroids and their precursors. Using the combinatorial partitioning method we tested various thresholds for each metabolic trait and each individual SNP locus. One SNP in CYP19, 3UTR, two SNPs in CYP1B1 (R48G and A119S) and one in CYP1A1 (T461N) were significantly differently distributed between the high and low level metabolic groups. The leave one out cross validation method showed that 6 SNPs in concert make 65% correct prediction of phenotype. Further we used pattern recognition, computing the p-value by Monte Carlo simulation to identify sets of SNPs and physiological characteristics such as age and weight that contribute to a given metabolic level. Since the SNPs detected by both methods reside either in the same gene (CYP1B1) or in 3 different genes in immediate vicinity on chromosome 15 (CYP19, CYP11 and CYP1A1) we investigated the possibility that they form intragenic and intergenic haplotypes, which may jointly account for a higher activity in the pathway. We identified such haplotypes associated with metabolic levels. CONCLUSION: The methods reported here may enable to study multiple low-penetrance genetic factors that together determine various quantitative phenotypic traits. Our preliminary data suggest that several genes coding for proteins involved in a common pathway, that happen to be located on common chromosomal areas and may form intragenic haplotypes, together account for a higher activity of the whole pathway

    The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases

    Get PDF
    Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases

    Practical and Theoretical Considerations in Study Design for Detecting Gene-Gene Interactions Using MDR and GMDR Approaches

    Get PDF
    Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50∼0.65) reported in the literature. The GMDR with covariate adjustment had a power of>80% in a case-control design with a sample size of≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was<0.56, a sample size of≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56∼0.62 for a sample size of 1000–2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000∼2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy<0.56

    Improved branch and bound algorithm for detecting SNP-SNP interactions in breast cancer

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) in genes derived from distinct pathways are associated with a breast cancer risk. Identifying possible SNP-SNP interactions in genome-wide case–control studies is an important task when investigating genetic factors that influence common complex traits; the effects of SNP-SNP interaction need to be characterized. Furthermore, observations of the complex interplay (interactions) between SNPs for high-dimensional combinations are still computationally and methodologically challenging. An improved branch and bound algorithm with feature selection (IBBFS) is introduced to identify SNP combinations with a maximal difference of allele frequencies between the case and control groups in breast cancer, i.e., the high/low risk combinations of SNPs. RESULTS: A total of 220 real case and 334 real control breast cancer data are used to test IBBFS and identify significant SNP combinations. We used the odds ratio (OR) as a quantitative measure to estimate the associated cancer risk of multiple SNP combinations to identify the complex biological relationships underlying the progression of breast cancer, i.e., the most likely SNP combinations. Experimental results show the estimated odds ratio of the best SNP combination with genotypes is significantly smaller than 1 (between 0.165 and 0.657) for specific SNP combinations of the tested SNPs in the low risk groups. In the high risk groups, predicted SNP combinations with genotypes are significantly greater than 1 (between 2.384 and 6.167) for specific SNP combinations of the tested SNPs. CONCLUSIONS: This study proposes an effective high-speed method to analyze SNP-SNP interactions in breast cancer association studies. A number of important SNPs are found to be significant for the high/low risk group. They can thus be considered a potential predictor for breast cancer association

    ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability.</p> <p>Methods</p> <p>Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications <it>in silico </it>using simulated datasets.</p> <p>Results</p> <p>We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage.</p> <p>Conclusions</p> <p>We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.</p

    AN EVALUATION OF GENE INTERACTIONS AFFECTING CARCASS YIELD AND MARBLING IN BEEF CATTLE

    Get PDF
    Genotype-specific management of beef cattle in feedlots has the potential to improve carcass uniformity. Gene variants affecting marbling include LEPc.73C>T, ADH1Cc.-64T>C, TG5, and GALR2c.-199T>G while those in CRHc.22C>G, POMCc.288C>T, MC4Rc.856C>G and IGF2c.-292C>T influence lean yield. The purpose of the current study was to assess combinations of marbling gene variants with those associated with lean yield and to investigate the effects of a gene variant in serotonin receptor 1B (HTR1B) on beef carcass traits. Gene variants were initially genotyped in 386 crossbred steers and evaluated for associations with carcass traits (hot carcass weight, average fat, grade fat and rib-eye area). The goal was to select a subset of variants to genotype in 2000 steers (1000 with hormone implants and 1000 without implants) with camera graded carcass data (Vision USDA yield grade, Vision grade marbling, rib-eye area and fat thickness). Seven gene variants were selected to proceed with (TG was discontinued) as they either had an association or were involved in gene interactions affecting a trait. In the implanted steers GALR2 affected rib-eye area (P=0.002) where it exhibited an additive effect (TT=83.74 cm2, TG= 84.32 cm2 and GG=86.90 cm2) however there was a dominant effect of the T allele for marbling (P=0.0001; TT/TG = 397.83 and GG=378.27) and fat (P=0.001; TT/TG=8.38 mm and GG=7.31). This same association with marbling (PT and IGF2c.-292C>T with fat (P=0.05) and a trend with marbling (P=0.07); MC4Rc.856C>G and POMCc.288C>T with marbling (P=0.05); and GALR2c.-199T>G and POMCc.288C>T with rib-eye area (P=0.03). Associations between gene variants with traits were made simpler due to the fact that some genotypes could be collapsed, as least square means (LSM) were not significantly different, indicating a dominant effect of one allele. The ability to pool genotypes not only simplified the interactions, it resulted in a larger number of animals with combined genotypes. The gene SNP networks generated using EPISNP support the mode of action between gene variants. For example, the gene interaction that was a 3 by 2 was also determined to be Additive-Dominance. Significant associations were also identified between HTRIB c.205G>T SNP with carcass average fat (P=0.001), grade fat (P=0.007) and cutability (P=0.001) and a trend was observed with carcass REA (P=0.061). Although finding significance with several economically important carcass traits in crossbred beef breeds is novel, validating the effects of the HTRIB c.205G>T SNP in a larger cattle population would be beneficial

    Statistical methods in genetics

    Get PDF
    Abstract In recent years, a very large variety of statistical methodologies, at various levels of complexity, have been put forward to analyse genotype data and detect genetic variations that may be responsible for increasing the susceptibility to disease. This review provides a concise account of a number of selected statistical methods for population-based association mapping, from single-marker tests of association to multi-marker data mining techniques for gene^gene interaction detection

    Aggregated Quantitative Multifactor Dimensionality Reduction

    Get PDF
    We consider the problem of making predictions for quantitative phenotypes based on gene-to-gene interactions among selected Single Nucleotide Polymorphisms (SNPs). Previously, Quantitative Multifactor Dimensionality Reduction (QMDR) has been applied to detect gene-to-gene interactions associated with elevated quantitative phenotypes, by creating a dichotomous predictor from one interaction which has been deemed optimal. We propose an Aggregated Quantitative Multifactor Dimensionality Reduction (AQMDR), which exhaustively considers all k-way interactions among a set of SNPs and replaces the dichotomous predictor from QMDR with a continuous aggregated score. We evaluate this new AQMDR method in a series of simulations for two-way and three-way interactions, comparing the new method with the original QMDR. In simulation, AQMDR yields consistently smaller prediction error than QMDR when more than one significant interaction is present in the simulation model. Theoretical support is provided for the method, and the method is applied on Alzheimer\u27s Disease (AD) data to identify significant interactions between APOE4 and other AD associated SNPs
    corecore