74 research outputs found

    Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF).</p> <p>Results</p> <p>SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm.</p> <p>Conclusion</p> <p>Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from <url>http://www.epistasis.org</url>.</p

    Bioinformatics challenges for genome-wide association studies

    Get PDF
    Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods

    Identification of gene-gene interactions for Alzheimer's disease using co-operative game theory

    Full text link
    Thesis (Ph.D.)--Boston UniversityThe multifactorial nature of Alzheimer's Disease suggests that complex gene-gene interactions are present in AD pathways. Contemporary approaches to detect such interactions in genome-wide data are mathematically and computationally challenging. We investigated gene-gene interactions for AD using a novel algorithm based on cooperative game theory in 15 genome-wide association study (GWAS) datasets comprising of a total of 11,840 AD cases and 10,931 cognitively normal elderly controls from the Alzheimer Disease Genetics Consortium (ADGC). We adapted this approach, which was developed originally for solving multi-dimensional problems in economics and social sciences, to compute a Shapely value statistic to identify genetic markers that contribute most to coalitions of SNPs in predicting AD risk. Treating each GWAS dataset as independent discovery, markers were ranked according to their contribution to coalitions formed with other markers. Using a backward elimination strategy, markers with low Shapley values were eliminated and the statistic was recalculated iteratively. We tested all two-way interactions between top Shapley markers in regression models which included the two SNPs (main effects) and a term for their interaction. Models yielding a p-value<0.05 for the interaction term were evaluated in each of the other datasets and the results from all datasets were combined by meta-analysis. Statistically significant interactions were observed with multiple marker combinations in the APOE regions. My analyses also revealed statistically strong interactions between markers in 6 regions; CTNNA3-ATP11A (p=4.1E-07), CSMD1-PRKCQ (p=3.5E-08), DCC-UNC5CL (p=5.9e-8), CNTNAP2-RFC3 (p=1.16e-07), AACS-TSHZ3 (p=2.64e-07) and CAMK4-MMD (p=3.3e-07). The Shapley value algorithm outperformed Chi-Square and ReliefF in detecting known interactions between APOE and GAB2 in a previously published GWAS dataset. It was also more accurate than competing filtering methods in identifying simulated epistastic SNPs that are additive in nature, but its accuracy was low in identifying non-linear interactions. The game theory algorithm revealed strong interactions between markers in novel genes with weak main effects, which would have been overlooked if only markers with strong marginal association with AD were tested. This method will be a valuable tool for identifying gene-gene interactions for complex diseases and other traits

    Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease

    Get PDF
    Background: Identification of genetic variants that are associated with disease is an important goal in elucidating the genetic causes of diseases. The genetic patterns that are associated with common diseases are complex and may involve multiple interacting genetic variants. The Relief family of algorithms is a powerful tool for efficiently identifying genetic variants that are associated with disease, even if the variants have nonlinear interactions without significant main effects. Many variations of Relief have been developed over the past two decades and several of them have been applied to single nucleotide polymorphism (SNP) data. Results: We developed a new spatially weighted variation of Relief called Sigmoid Weighted ReliefF Star (SWRF*), and applied it to synthetic SNP data. When compared to ReliefF and SURF*, which are two algorithms that have been applied to SNP data for identifying interactions, SWRF* had significantly greater power. Furthermore, we developed a framework called the Modular Relief Framework (MoRF) that can be used to develop novel variations of the Relief algorithm, and we used MoRF to develop the SWRF* algorithm. Conclusions: MoRF allows easy development of new Relief algorithms by specifying different interchangeable functions for the component terms. Using MORF, we developed a new Relief algorithm called SWRF* that had greater ability to identify interacting genetic variants in synthetic data compared to existing Relief algorithms. © 2012 Stokes and Visweswaran.; licensee BioMed Central Ltd

    Predicting the Difficulty of Pure, Strict, Epistatic Models: Metrics for Simulated Model Selection

    Get PDF
    Background: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection. Results: We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability. Conclusions: This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models
    corecore