Search CORE

53 research outputs found

Application of next generation sequencing to CEPH cell lines to discover variants associated with FDA approved chemotherapeutics

Author: Hariani GD
Havener T
Kwok Pui-Yan
Kwok PY
Lam EJ
McLeod HL
Motsinger-Reif AA
Wagner MJ
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

After publication of this work [1], it has come to our attention that there is an error in the author list of the initial version of this manuscript; rather than Ernest J Lam, the second author of the manuscript should be listed as Ernest T Lam

Crossref

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

eScholarship - University of California

Neural networks for modeling gene-gene interactions in association studies

Author: A Jakulin
AA Motsinger
AA Motsinger
AA Motsinger-Reif
AA Motsinger-Reif
AA Motsinger-Reif
AA Motsinger-Reif
AD Flouris
AG Heidema
B North
BA McKinney
CM Bishop
Frauke Günther
G Schwarz
H Akaike
HJ Cordell
I Ruczinski
J Liu
J Millstein
J Ott
JH Moore
JH Moore
JR Koza
K Bammann
K Broberg
Karin Bammann
L Breiman
L Briollais
LW Hahn
M Riedmiller
MB Lanktree
MD Ritchie
MD Ritchie
ME Sáez
MJ Wade
MR Nelson
N Risch
Nina Wawro
NR Cook
P McCullagh
PR Lucek
R Development Core Team
R Foraita
R Hecht-Nielsen
R Tibshirani
RL Milne
S Fritsch
SH Chen
SK Musani
W Branicki
W Li
WS Bush
X Tang
Y Amit
Y Qi
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied. Results The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction. Conclusions Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evolving Random Forest for Preference Learning

Author: AA Motsinger-Reif
C Pedersen
G Tesauro
GN Yannakakis
J Doyle
J Fürnkranz
J Madsen
L Breiman
M O’Neill
M O’Neill
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

VBN

Grammatical evolution decision trees for detecting gene-gene interactions

Author: AA Motsinger
AA Motsinger-Reif
AA Motsinger-Reif
Alison A Motsinger-Reif
BA Shepherd
BLG Miller
CS Greene
D Altshuler
DB Goldstein
DR Velez
E Alpaydin
E Cantu-Paz
HJ Cordell
IH Witten
J Koza
J Koza
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
JR Quinlan
JS Aguilar-Ruiz
L Brieman
LGL Devroy
M Hall
M O'Neill
M O'Neill
MD Ritchie
MR Nelson
Nicholas E Hardison
R Bellman
R Culverhouse
RJ Neuman
SM Dudek
Stacey J Winham
Sushamna Deodhar
TJ Hastie
W Li
X Yao
Publication venue: BioMed Central
Publication date: 01/11/2010
Field of study

Abstract Background A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p

Crossref

Directory of Open Access Journals

PubMed Central

Bayesian neural networks for detecting epistasis in genetic association studies

Author: A Klockner
AA Motsinger-Reif
AA Motsinger-Reif
Alison Motsinger-Reif
Andrew L Beam
B Baesens
C Andrieu
CL Koo
CS Greene
DE Rumelhart
E Png
G Hemani
GE Hinton
I Guyon
J Bergstra
J Li
J Nickolls
JH Friedman
JH Friedman
JH Moore
Jon Doyle
K Hornik
K Oh
KL Lunetta
L Breiman
LW Hahn
N Metropolis
NO Oki
PJ Lisboa
PM Williams
R Diaz-Uriarte
R Neal
RJ Urbanowicz
RM Motsinger-Reif AA
RM Neal
RM Neal
RM Neal
T Van Gestel
TA Manolio
W Li
WK Hastings
X Jiang
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An investigation of gene-gene interactions in dose-response studies with Bayesian nonparametrics

Author: A Beam
AA Motsinger-Reif
AA Motsinger-Reif
AL Beam
AL Beam
Alison A Motsinger-Reif
Andrew L Beam
AR Wood
B Shahbaba
C Brown
CC Brown
CC Brown
CC Brown
CC Brown
EJ Peters
HE Wheeler
JH Moore
JH Moore
Jon Doyle
K Hornik
M Girolami
M Remke
M Welsh
N Metropolis
O Bahcall
O Zuk
RM Neal
S Purcell
TF Mackay
W Huang
WK Hastings
Y Li
Ö Carlborg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Neural networks for genetic epidemiology: past, present, and future

During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies

Author: AA Motsinger-Reif
B Gaines
C Bourgain
CM Lewis
Dirk Zeumer
E Frank
EJ Louis
HJ Cordell
I Witten
JH Moore
Jing Yuan
JN Hirschhorn
KAB Goddard
LW Hahn
Marylyn D Ritchie
MD Ritchie
ML Calle
PN Tan
R Quinlan
ST Sherry
Supriya Jayadev
T Schaap
TA Thornton-Wells
Thorsten Lehr
W Cohen
XY Lou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Several methods have been presented for the analysis of complex interactions between genetic polymorphisms and/or environmental factors. Despite the available methods, there is still a need for alternative methods, because no single method will perform well in all scenarios. The aim of this work was to evaluate the performance of three selected rule based classifier algorithms, RIPPER, RIDOR and PART, for the analysis of genetic association studies. Methods Overall, 42 datasets were simulated with three different case-control models, a varying number of subjects (300, 600), SNPs (500, 1500, 3000) and noise (5%, 10%, 20%). The algorithms were applied to each of the datasets with a set of algorithm-specific settings. Results were further investigated with respect to a) the Model, b) the Rules, and c) the Attribute level. Data analysis was performed using WEKA, SAS and PERL. Results The RIPPER algorithm discovered the true case-control model at least once in >33% of the datasets. The RIDOR and PART algorithm performed poorly for model detection. The RIPPER, RIDOR and PART algorithm discovered the true case-control rules in more than 83%, 83% and 44% of the datasets, respectively. All three algorithms were able to detect the attributes utilized in the respective case-control models in most datasets. Conclusions The current analyses substantiate the utility of rule based classifiers such as RIPPER, RIDOR and PART for the detection of gene-gene/gene-environment interactions in genetic association studies. These classifiers could provide a valuable new method, complementing existing approaches, in the analysis of genetic association studies. The methods provide an advantage in being able to handle both categorical and continuous variable types. Further, because the outputs of the analyses are easy to interpret, the rule based classifier approach could quickly generate testable hypotheses for additional evaluation. Since the algorithms are computationally inexpensive, they may serve as valuable tools for preselection of attributes to be used in more complex, computationally intensive approaches. Whether used in isolation or in conjunction with other tools, rule based classifiers are an important addition to the armamentarium of tools available for analyses of complex genetic association studies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Novel human genetic variants associated with extrapulmonary tuberculosis: a pilot genome wide association study

Abstract Background Approximately 5-10% of persons infected with <it>M. tuberculosis </it>develop tuberculosis, but the factors associated with disease progression are incompletely understood. Both linkage and association studies have identified human genetic variants associated with susceptibility to pulmonary tuberculosis, but few genetic studies have evaluated extrapulmonary disease. Because extrapulmonary and pulmonary tuberculosis likely have different underlying pathophysiology, identification of genetic mutations associated with extrapulmonary disease is important. Findings We performed a pilot genome-wide association study among 24 persons with previous extrapulmonary tuberculosis and well-characterized immune defects; 24 pulmonary tuberculosis patients and 57 patients with <it>M. tuberculosis </it>infection served as controls. The Affymetrix GeneChip Human Mapping Xba Array was used for genotyping; after careful quality control, genotypes at 44,175 single nucleotide polymorphisms (SNPs) were available for analysis. Eigenstrat quantified population stratification within our sample; logistic regression, using results of the Eigenstrat analysis as a covariate, identified significant associations between groups. Permutation testing controlled the family-wise error rate for each comparison between groups. Four SNPs were significantly associated with extrapulmonary tuberculosis compared to controls with <it>M. tuberculosis </it>infection; one (rs4893980) in the gene PDE11A, one (rs10488286) in KCND2, and one (rs2026414) in PCDH15; one was in chromosome 7 but not associated with a known gene. Two additional variants were significantly associated with extrapulmonary tuberculosis compared with pulmonary tuberculosis; one (rs340708) in the gene FAM135B and one in chromosome 13 but not associated with a known gene. The function of all four genes affects cell signaling and activity, including in the brain. Conclusions In this pilot study, we identified 6 novel variants not previously known to be associated with extrapulmonary tuberculosis, including two SNPs more common in persons with extrapulmonary than pulmonary tuberculosis. This provides some support for the hypothesis that the pathogenesis and genetic predisposition to extrapulmonary tuberculosis differs from pulmonary tuberculosis. Further study of these novel SNPs, and more well-powered genome-wide studies of extrapulmonary tuberculosis, is warranted.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Application of two machine learning algorithms to genetic association studies in the presence of covariates

Author: A Bureau
AA Motsinger-Reif
Andrea S Foulkes
AS Foulkes
B Dasarathy
Bareng AS Nonyane
BAS Nonyane
C Bishop
C Tan
D Ge
D Lunn
E Atkinson
E Taioli
I Guyon
IE George
J Cohen
JH Friedman
JM Robins
L Breiman
L Breiman
L Cupples
M Groenendijk
MA Hernan
MJ van der Laan
MR Segal
NJS Christenfield
RV Shohet
S Kang
SJ Tannenbaum
SR Cole
T Hastie
TJ Costello
TM Huang
VN Vapnik
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: Population-based investigations aimed at uncovering genotype-trait associations often involve high-dimensional genetic polymorphism data as well as information on multiple environmental and clinical parameters. Machine learning (ML) algorithms offer a straightforward analytic approach for selecting subsets of these inputs that are most predictive of a pre-defined trait. The performance of these algorithms, however, in the presence of covariates is not well characterized. METHODS AND RESULTS: In this manuscript, we investigate two approaches: Random Forests (RFs) and Multivariate Adaptive Regression Splines (MARS). Through multiple simulation studies, the performance under several underlying models is evaluated. An application to a cohort of HIV-1 infected individuals receiving anti-retroviral therapies is also provided. CONCLUSION: Consistent with more traditional regression modeling theory, our findings highlight the importance of considering the nature of underlying gene-covariate-trait relationships before applying ML algorithms, particularly when there is potential confounding or effect mediation

Crossref

LSHTM Research Online

ScholarWorks@UMass Amherst

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central