53 research outputs found

    Application of next generation sequencing to CEPH cell lines to discover variants associated with FDA approved chemotherapeutics

    Get PDF
    After publication of this work [1], it has come to our attention that there is an error in the author list of the initial version of this manuscript; rather than Ernest J Lam, the second author of the manuscript should be listed as Ernest T Lam

    Neural networks for modeling gene-gene interactions in association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied.</p> <p>Results</p> <p>The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction.</p> <p>Conclusions</p> <p>Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.</p

    Grammatical evolution decision trees for detecting gene-gene interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.</p> <p>Methods</p> <p>Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions.</p> <p>Results</p> <p>The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects.</p> <p>Conclusions</p> <p>GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p

    Neural networks for genetic epidemiology: past, present, and future

    Get PDF
    During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes

    Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several methods have been presented for the analysis of complex interactions between genetic polymorphisms and/or environmental factors. Despite the available methods, there is still a need for alternative methods, because no single method will perform well in all scenarios. The aim of this work was to evaluate the performance of three selected rule based classifier algorithms, RIPPER, RIDOR and PART, for the analysis of genetic association studies.</p> <p>Methods</p> <p>Overall, 42 datasets were simulated with three different case-control models, a varying number of subjects (300, 600), SNPs (500, 1500, 3000) and noise (5%, 10%, 20%). The algorithms were applied to each of the datasets with a set of algorithm-specific settings. Results were further investigated with respect to a) the Model, b) the Rules, and c) the Attribute level. Data analysis was performed using WEKA, SAS and PERL.</p> <p>Results</p> <p>The RIPPER algorithm discovered the true case-control model at least once in >33% of the datasets. The RIDOR and PART algorithm performed poorly for model detection. The RIPPER, RIDOR and PART algorithm discovered the true case-control rules in more than 83%, 83% and 44% of the datasets, respectively. All three algorithms were able to detect the attributes utilized in the respective case-control models in most datasets.</p> <p>Conclusions</p> <p>The current analyses substantiate the utility of rule based classifiers such as RIPPER, RIDOR and PART for the detection of gene-gene/gene-environment interactions in genetic association studies. These classifiers could provide a valuable new method, complementing existing approaches, in the analysis of genetic association studies. The methods provide an advantage in being able to handle both categorical and continuous variable types. Further, because the outputs of the analyses are easy to interpret, the rule based classifier approach could quickly generate testable hypotheses for additional evaluation. Since the algorithms are computationally inexpensive, they may serve as valuable tools for preselection of attributes to be used in more complex, computationally intensive approaches. Whether used in isolation or in conjunction with other tools, rule based classifiers are an important addition to the armamentarium of tools available for analyses of complex genetic association studies.</p

    Novel human genetic variants associated with extrapulmonary tuberculosis: a pilot genome wide association study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Approximately 5-10% of persons infected with <it>M. tuberculosis </it>develop tuberculosis, but the factors associated with disease progression are incompletely understood. Both linkage and association studies have identified human genetic variants associated with susceptibility to pulmonary tuberculosis, but few genetic studies have evaluated extrapulmonary disease. Because extrapulmonary and pulmonary tuberculosis likely have different underlying pathophysiology, identification of genetic mutations associated with extrapulmonary disease is important.</p> <p>Findings</p> <p>We performed a pilot genome-wide association study among 24 persons with previous extrapulmonary tuberculosis and well-characterized immune defects; 24 pulmonary tuberculosis patients and 57 patients with <it>M. tuberculosis </it>infection served as controls. The Affymetrix GeneChip Human Mapping Xba Array was used for genotyping; after careful quality control, genotypes at 44,175 single nucleotide polymorphisms (SNPs) were available for analysis. Eigenstrat quantified population stratification within our sample; logistic regression, using results of the Eigenstrat analysis as a covariate, identified significant associations between groups. Permutation testing controlled the family-wise error rate for each comparison between groups. Four SNPs were significantly associated with extrapulmonary tuberculosis compared to controls with <it>M. tuberculosis </it>infection; one (rs4893980) in the gene PDE11A, one (rs10488286) in KCND2, and one (rs2026414) in PCDH15; one was in chromosome 7 but not associated with a known gene. Two additional variants were significantly associated with extrapulmonary tuberculosis compared with pulmonary tuberculosis; one (rs340708) in the gene FAM135B and one in chromosome 13 but not associated with a known gene. The function of all four genes affects cell signaling and activity, including in the brain.</p> <p>Conclusions</p> <p>In this pilot study, we identified 6 novel variants not previously known to be associated with extrapulmonary tuberculosis, including two SNPs more common in persons with extrapulmonary than pulmonary tuberculosis. This provides some support for the hypothesis that the pathogenesis and genetic predisposition to extrapulmonary tuberculosis differs from pulmonary tuberculosis. Further study of these novel SNPs, and more well-powered genome-wide studies of extrapulmonary tuberculosis, is warranted.</p

    Application of two machine learning algorithms to genetic association studies in the presence of covariates

    Get PDF
    BACKGROUND: Population-based investigations aimed at uncovering genotype-trait associations often involve high-dimensional genetic polymorphism data as well as information on multiple environmental and clinical parameters. Machine learning (ML) algorithms offer a straightforward analytic approach for selecting subsets of these inputs that are most predictive of a pre-defined trait. The performance of these algorithms, however, in the presence of covariates is not well characterized. METHODS AND RESULTS: In this manuscript, we investigate two approaches: Random Forests (RFs) and Multivariate Adaptive Regression Splines (MARS). Through multiple simulation studies, the performance under several underlying models is evaluated. An application to a cohort of HIV-1 infected individuals receiving anti-retroviral therapies is also provided. CONCLUSION: Consistent with more traditional regression modeling theory, our findings highlight the importance of considering the nature of underlying gene-covariate-trait relationships before applying ML algorithms, particularly when there is potential confounding or effect mediation
    corecore