Search CORE

20,684 research outputs found

Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

Author: Richard E. Neapolitan
Xia Jiang
Xiaofeng Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 12/10/2012
Field of study

Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Recommended from our members

GenEpi: gene-based epistasis discovery using machine learning.

Author: Alzheimer’s Disease Neuroimaging Initiative
Chang Yu-Chuan
Chen Chien-Yu
Giacomini Kathleen M
Hong Ming-Yi
Hsieh Ping-Han
Oyang Yen-Jen
Tung Yi-An
Wu June-Tai
Yee Sook Wah
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future

eScholarship - University of California

Evidence for suppression of immunity as a driver for genomic introgressions and host range expansion in races of Albugo candida, a generalist parasite

Author: Abbott
Altschul
Baack
Barbetti
Baxter
Biga
Blanco
Boni
Borhan
Choi
Choi
Choi
Conesa
Cooper
Diogo
Dodds
Dong
Doolittle
Drummond
Drès
Dujon
Eddy
Ersek
Farrer
Fontaine
Galagan
Greig
Haas
Hacquard
Harper
Haubold
Heath
Hedrick
Hiura
Hobolth
Holub
Huelsenbeck
Jukes
Jurka
Kemen
Kent
Kole
Lamichhaney
Lamour
Langmead
Lebrun
Levesque
Li
Li
Links
Liu
Liu
Lomsadze
Martin
Martin
Martin
McHale
Mckinney
Meena
Mehrabi
Mehta
Moller
Monod
Nagahara
Nichols
Padidam
Pamilo
Parker
Parra
Payen
Petersen
Petrie
Ploch
Posada
Poulin
Pound
Price
Pruitt
Punta
Quinlan
Rimmer
Ruderfer
Saharan
Schulz
Seehausen
Smith
Soanes
Sommer
Stanke
Stukenbrock
Stukenbrock
Stukenbrock
Suzek
Tajima
The Arabidopsis Genome Initiative
Thines
Thines
Thompson
Tyler
Ungerer
Usher
Voglmayr
Walker
Zerbino
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2015
Field of study

How generalist parasites with wide host ranges can evolve is a central question in parasite evolution. Albugo candida is an obligate biotrophic parasite that consists of many physiological races that each specialize on distinct Brassicaceae host species. By analyzing genome sequence assemblies of five isolates, we show they represent three races that are genetically diverged by ∼1%. Despite this divergence, their genomes are mosaic-like, with ∼25% being introgressed from other races. Sequential infection experiments show that infection by adapted races enables subsequent infection of hosts by normally non-infecting races. This facilitates introgression and the exchange of effector repertoires, and may enable the evolution of novel races that can undergo clonal population expansion on new hosts. We discuss recent studies on hybridization in other eukaryotes such as yeast, Heliconius butterflies, Darwin’s finches, sunflowers and cichlid fishes, and the implications of introgression for pathogen evolution in an agro-ecological environment

Crossref

Copenhagen University Research Information System

PubMed Central

Warwick Research Archives Portal Repository

University of East Anglia digital repository

MPG.PuRe

A fast algorithm for detecting gene-gene interactions in genome-wide association studies

Author: Li Jiahan
Li Runze
Wu Rongling
Zhong Wei
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 03/02/2015
Field of study

With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS771 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Medical Statistics - Current Developments in Statistical Methodology for Genetic Architecture of Complex Diseases

Author
Publication venue: Oberwolfach-Walke : Mathematisches Forschungsinstitut Oberwolfach
Publication date: 01/01/2003
Field of study

[no abstract available

Repositorium für Naturwissenschaften und Technik