Search CORE

7,366 research outputs found

Recommended from our members

GenEpi: gene-based epistasis discovery using machine learning.

Author: Alzheimer’s Disease Neuroimaging Initiative
Chang Yu-Chuan
Chen Chien-Yu
Giacomini Kathleen M
Hong Ming-Yi
Hsieh Ping-Han
Oyang Yen-Jen
Tung Yi-An
Wu June-Tai
Yee Sook Wah
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future

eScholarship - University of California

Discrete Algorithms for Analysis of Genotype Data

Author: Brinza Dumitru
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2007
Field of study

Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. The optimization formulations for searching disease-associated risk/resistant factors and predicting disease susceptibility for given case-control study have been introduced. Several discrete methods for disease association search exploiting greedy strategy and topological properties of case-control studies have been developed. New disease susceptibility prediction methods based on the developed search methods have been validated on datasets from case-control studies for several common diseases. Our experiments compare favorably the proposed algorithms with the existing association search and susceptibility prediction methods

CiteSeerX

ScholarWorks @ Georgia State University

Algorithms for Computational Genetics Epidemiology

Author: He Jingwu
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2006
Field of study

The most intriguing problems in genetics epidemiology are to predict genetic disease susceptibility and to associate single nucleotide polymorphisms (SNPs) with diseases. In such these studies, it is necessary to resolve the ambiguities in genetic data. The primary obstacle for ambiguity resolution is that the physical methods for separating two haplotypes from an individual genotype (phasing) are too expensive. Although computational haplotype inference is a well-explored problem, high error rates continue to deteriorate association accuracy. Secondly, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest of the SNPs (tagging). Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates importance of informative SNP selection for compaction of huge genetic data in order to make feasible fine genotype analysis. Finally, even if complete and accurate data is available, it is unclear if common statistical methods can determine the susceptibility of complex diseases. The dissertation explores above computational problems with a variety of methods, including linear algebra, graph theory, linear programming, and greedy methods. The contributions include (1)significant speed-up of popular phasing tools without compromising their quality, (2)stat-of-the-art tagging tools applied to disease association, and (3)graph-based method for disease tagging and predicting disease susceptibility

CiteSeerX

ScholarWorks @ Georgia State University

The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases

Author: A Bureau
A Geert Heidema
A Wille
AA Motsinger
BV North
CM Bishop
CS Coffey
Daphne L van der A
DJF De Quervain
DR Cox
Edith JM Feskens
Edwin CM Mariman
IR Dohoo
J Hoh
J Hoh
J Ott
J Ott
J Xu
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
Jolanda MA Boer
KL Lunetta
L Li
LW Hahn
MA Province
MD Ritchie
MD Ritchie
MD Ritchie
MR Nelson
N Nagelkerke
Nico Nagelkerke
NJ Schork
P Peduzzi
PR Lucek
R Bellman
R Culverhouse
R Culverhouse
R Tibshirani
RA Wilke
RYL Zee
SM Williams
TA Thornton-Wells
Y Benjamini
Y Tomita
YM Cho
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Multifactor-dimensionality reduction versus family-based association tests in detecting susceptibility loci in discordant sib-pair studies

Author: Farrell John
Farrer Lindsay A
Ma Qianli
Meng Yan
Wilcox Marsha A
Yu Yi
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

Complex diseases are generally thought to be under the influence of multiple, and possibly interacting, genes. Many association methods have been developed to identify susceptibility genes assuming a single-gene disease model, referred to as single-locus methods. Multilocus methods consider joint effects of multiple genes and environmental factors. One commonly used method for family-based association analysis is implemented in FBAT. The multifactor-dimensionality reduction method (MDR) is a multilocus method, which identifies multiple genetic loci associated with the occurrence of complex disease. Many studies of late onset complex diseases employ a discordant sib pairs design. We compared the FBAT and MDR in their ability to detect susceptibility loci using a discordant sib-pair dataset generated from the simulated data made available to participants in the Genetic Analysis Workshop 14. Using FBAT, we were able to identify the effect of one susceptibility locus. However, the finding was not statistically significant. We were not able to detect any of the interactions using this method. This is probably because the FBAT test is designed to find loci with major effects, not interactions. Using MDR, the best result we obtained identified two interactions. However, neither of these reached a level of statistical significance. This is mainly due to the heterogeneity of the disease trait and noise in the data

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Are plants with anti-cancer activity resistant to crown gall? : A test of hypothesis

Author: BT. Ramesha
G. Ravikanth
KN Ganeshaiah
R Srirama
R. Uma Shaanker
Publication venue
Publication date: 22/12/2007
Field of study

The Crown gall tumour assay (CGTA) is one of several bench top bioassays recommended for the rapid screening of plants with anti-cancer activity. The rationale for the use of the bioassay is that the tumorogenic mechanism initiated in plant tissues by _Agrobacterium tumefaciens_ is in many ways similar to that of animals. Several plant species with anti-cancer activity have already been discovered using this bioassay. However till date no explicit test of an association between anti-cancer activity of plants and their resistance to crown gall formation has been demonstrated. Demonstration of an association could have exploratory potential when searching for plants with anti-cancer activity. In this paper, we determined whether or not a statistically significant association between crown gall resistance and anti-cancer activity exists in plants found in existing published data sets. Our results indicate that plants with anti-cancer activity have a higher proportion of their species resistant to crown gall formation compared to a random selection of plants. We discuss the implications of our results especially when prospecting for newer sources of anti-cancer activity in plants

Nature Precedings

ePrints@ATREE

Allele-specific network reveals combinatorial interaction that transcends small effects in psoriasis GWAS

Author: Climer Sharlee
Templeton Alan R
Zhang Weixiong
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

<div><p>Hundreds of genetic markers have shown associations with various complex diseases, yet the “missing heritability” remains alarmingly elusive. Combinatorial interactions may account for a substantial portion of this missing heritability, but their discoveries have been impeded by computational complexity and genetic heterogeneity. We present BlocBuster, a novel systems-level approach that efficiently constructs genome-wide, allele-specific networks that accurately segregate homogenous combinations of genetic factors, tests the associations of these combinations with the given phenotype, and rigorously validates the results using a series of unbiased validation methods. BlocBuster employs a correlation measure that is customized for single nucleotide polymorphisms and returns a multi-faceted collection of values that captures genetic heterogeneity. We applied BlocBuster to analyze psoriasis, discovering a combinatorial pattern with an odds ratio of 3.64 and Bonferroni-corrected p-value of 5.01×10<sup>−16</sup>. This pattern was replicated in independent data, reflecting robustness of the method. In addition to improving prediction of disease susceptibility and broadening our understanding of the pathogenesis underlying psoriasis, these results demonstrate BlocBuster's potential for discovering combinatorial genetic associations within heterogeneous genome-wide data, thereby transcending the limiting “small effects” produced by individual markers examined in isolation.</p></div

Directory of Open Access Journals

Digital Commons@Becker

PubMed Central

University of Missouri, St. Louis

FigShare

Uncover disease genes by maximizing information flow in the phenome-interactome network.

Author: Chen Yong
Jiang Rui
Jiang Tao
Publication venue: Rowan Digital Works
Publication date: 14/06/2011
Field of study

MOTIVATION: Pinpointing genes that underlie human inherited diseases among candidate genes in susceptibility genetic regions is the primary step towards the understanding of pathogenesis of diseases. Although several probabilistic models have been proposed to prioritize candidate genes using phenotype similarities and protein-protein interactions, no combinatorial approaches have been proposed in the literature. RESULTS: We propose the first combinatorial approach for prioritizing candidate genes. We first construct a phenome-interactome network by integrating the given phenotype similarity profile, protein-protein interaction network and associations between diseases and genes. Then, we introduce a computational method called MAXIF to maximize the information flow in this network for uncovering genes that underlie diseases. We demonstrate the effectiveness of this method in prioritizing candidate genes through a series of cross-validation experiments, and we show the possibility of using this method to identify diseases with which a query gene may be associated. We demonstrate the competitive performance of our method through a comparison with two existing state-of-the-art methods, and we analyze the robustness of our method with respect to the parameters involved. As an example application, we apply our method to predict driver genes in 50 copy number aberration regions of melanoma. Our method is not only able to identify several driver genes that have been reported in the literature, it also shed some new biological insights on the understanding of the modular property and transcriptional regulation scheme of these driver genes

PubMed Central

Rowan University