Search CORE

D-Scholarship@Pitt

Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases

Author: Hahn Lance W
Moore Jason H
Parker Joel S
Ritchie Marylyn D
White Bill C
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. RESULTS: Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. CONCLUSION: This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases

Carolina Digital Repository

GPNN: Power Studies and Applications of a Neural Network Method for Detecting Gene-Gene Interactions in Studies of Human Disease

Author: Lee Stephen L
Mellick George
Motsinger Alison A
Ritchie Marylyn D
Publication venue: Dartmouth Digital Commons
Publication date: 25/01/2006
Field of study

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson\u27s disease

Dartmouth Digital Commons (Dartmouth College)

Recommended from our members

GenEpi: gene-based epistasis discovery using machine learning.

Author: Alzheimer’s Disease Neuroimaging Initiative
Chang Yu-Chuan
Chen Chien-Yu
Giacomini Kathleen M
Hong Ming-Yi
Hsieh Ping-Han
Oyang Yen-Jen
Tung Yi-An
Wu June-Tai
Yee Sook Wah
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future

eScholarship - University of California

Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions

Author: A Motsinger
AL Tyler
B McKinney
BA McKinney
Casey S Greene
CS Greene
CS Greene
CS Greene
I Kononenko
J Hardy
J Jakobsdottir
Jason H Moore
Jeff Kiralis
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
K Kira
L Beretta
M Robnik-Sikonja
M Robnik-Sikonja
MI McCarthy
MM Iles
Nadia M Penrod
P Kraft
RR Sokal
U Finckh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF). Results SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm. Conclusion Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from <url>http://www.epistasis.org</url>.</p

Dartmouth Digital Commons (Dartmouth College)

Determination of Nonlinear Genetic Architecture using Compressed Sensing

Author: Ho Chiu Man
Hsu Stephen D. H.
Publication venue
Publication date: 19/07/2015
Field of study

We introduce a statistical method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. The computational and data resource requirements are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. Our method uses a generalization of compressed sensing (L1-penalized regression) applied to nonlinear functions of the sensing matrix. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using both real and simulated human genomes.Comment: 20 pages, 8 figures. arXiv admin note: text overlap with arXiv:1408.342

arXiv.org e-Print Archive

xploring Genetic Interactions: from Tools Development with Massive Parallelization on GPGPU to Multi-Phenotype Studies on Dyslexia

Author: Jiang B.
Publication venue: Technische Universität München
Publication date: 18/12/2019
Field of study

Over a decade, genome-wide association studies (GWASs) have provided insightful information into the genetic architecture of complex traits. However, the variants found by GWASs explain just a small portion of heritability. Meanwhile, as large scale GWASs and meta-analyses of multiple phenotypes are becoming increasingly common, there is a need to develop computationally efficient models/tools for multi-locus studies and multi-phenotype studies. Thus, we were motivated to focus on the development of tools serving for epistatic studies and to seek for analysis strategy jointly analyzed multiple phenotypes. By exploiting the technical and methodological progress, we developed three R packages. SimPhe was built based on the Cockerham epistasis model to simulate (multiple correlated) phenotype(s) with epistatic effects. Another two packages, episcan and gpuEpiScan, simplified the calculation of EPIBALSTER and epiHSIC and were implemented with high performance, especially the package based on Graphics Processing Unit (GPU). The two packages can be employed by epistasis detection in both case-control studies and quantitative trait studies. Our packages might help drive down costs of computation and increase innovation in epistatic studies. Moreover, we explored the gene-gene interactions on developmental dyslexia, which is mainly characterized by reading problems in children. Multivariate meta-analysis was performed on genome-wide interaction study (GWIS) for reading-related phenotypes in the dyslexia dataset, which contains nine cohorts from different locations. We identified one genome-wide significant epistasis, rs1442415 and rs8013684, associated with word reading, as well as suggestive genetic interactions which might affect reading abilities. Except for rs1442415, which has been reported to influence educational attainment, the genetic variants involved in the suggestive interactions have shown associations with psychiatric disorders in previous GWASs, particularly with bipolar disorder. Our findings suggest making efforts to investigate not just the genetic interactions but also multiple correlated psychiatric disorders

MPG.PuRe

Grammatical evolution decision trees for detecting gene-gene interactions

Author: AA Motsinger
AA Motsinger-Reif
AA Motsinger-Reif
Alison A Motsinger-Reif
BA Shepherd
BLG Miller
CS Greene
D Altshuler
DB Goldstein
DR Velez
E Alpaydin
E Cantu-Paz
HJ Cordell
IH Witten
J Koza
J Koza
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
JR Quinlan
JS Aguilar-Ruiz
L Brieman
LGL Devroy
M Hall
M O'Neill
M O'Neill
MD Ritchie
MR Nelson
Nicholas E Hardison
R Bellman
R Culverhouse
RJ Neuman
SM Dudek
Stacey J Winham
Sushamna Deodhar
TJ Hastie
W Li
X Yao
Publication venue: BioMed Central
Publication date: 01/11/2010
Field of study

Abstract Background A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p

ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci

Author: A Bateman
A Freitas
AA Motsinger
AA Motsinger-Reif
B Maher
BC White
C Kooperberg
C Newton-Cheh
CJ Willer
CM Bishop
CR Porter
CS Carlson
CS Greene
CS Greene
CY Huang
D Ruano
DB Goldstein
E Boerwinkle
E Colucci-Guyon
ER Holzinger
F Sato
G Peng
H Shao
HJ Cordell
I Vastrik
I Xenarios
IG Sprinkhuizen-Kuyper
International hapmap consortium
International hapmap consortium
J Koza
J Meiler
J Moore
J Moore
J Ott
JE Dayhoff
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
KG Becker
KH Pietilainen
LA Hindorff
M Abney
M Ashburner
M Kanehisa
M O'Neil
Marylyn D Ritchie
MC Gruda
MD Ritchie
MD Ritchie
MR Nelson
N Killeen
N Penrod
P Cohen
P Gorry
P Lucek
R Bellman
R Culverhouse
R Culverhouse
R Linder
R Poli
R Shen
RD Finn
RJ Klein
S Itohara
S Kathiresan
S Kim
S Wright
SC Hamon
Scott M Dudek
SD Turner
SD Turner
SE Baranzini
SE Maxwell
Stephen D Turner
T Baba
TA Manolio
TL Edwards
TM Frayling
V Kurkova
WJ Gauderman
WJ Gauderman
WS Bush
X He
X Yao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability. Methods Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications <it>in silico </it>using simulated datasets. Results We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage. Conclusions We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.</p