Search CORE

31 research outputs found

A principal-components-based clustering method to identify multiple variants associated with rheumatoid arthritis and arthritis-related autoantibodies

Author: AG Heidema
CM Weyand
CW Harris
FA van Gaalen
HS Lee
JH Yen
JW Gauderman
K Wang
Mary Helen Black
P Irigoyen
Richard M Watanabe
RM Plenge
SHHM Vermeulen
T Shiina
Y Kochi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A random forest approach to the detection of epistatic interactions in case-control studies

Author: A Bureau
A Collins
AG Heidema
AM Glazier
BA McKinney
CT Tsai
E Lander
HC Fung
J Hoh
J Marchini
J Millstein
J Simon-Sanchez
JH Moore
JK Pritchard
L Breiman
L Kruglyak
L Tiret
MD Ritchie
MP Martin
MR Nelson
N Chatterjee
NJ Risch
R Culverhouse
R Diaz-Uriarte
R Jiang
R Jiang
RJ Klein
RO Duda
Rui Jiang
SM Williams
TM Phuong
Wanwan Tang
Wenhui Fu
X Chen
Xuebing Wu
Y Ye
Y Zhang
YM Cho
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates. Results We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease. Conclusion Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Examining the significance of fingerprint-based classifiers

Author: A Gerger
AG Heidema
B Adam
Brian T Luke
BT Luke
BT Luke
C Belluco
DF Ransohoff
DF Ransohoff
DG Ward
DK Ornstein
DW Ho
E Petricoin III
EF Petricoin
EF Petricoin
EF Petricoin
FM Brouwers
G Blanchard
H Zhang
Jack R Collins
JH Stone
JK Habermann
JM Luk
LA Liotta
M Ehmann
MA Gillette
MD Radmacher
NL Anderson
R Srinivasan
S Shah
SY Yang
T Sundsten
TP Conrads
W Liu
WE Grizzle
X Zhang
Y Yu
Y-Z Pan
Publication venue: BioMed Central
Publication date: 17/12/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Bias in random forest variable importance measures: Illustrations, sources and a solution

Author: A Bureau
A Dobra
A Liaw
Achim Zeileis
AG Heidema
AL Boulesteix
AL Boulesteix
Anne-Laure Boulesteix
C Furlanello
C Strobl
C Strobl
C Strobl
Carolin Strobl
DN Politis
EC Gunther
H Kim
I Kononenko
J Friedman
J Friedman
K Arun
KL Lunetta
L Breiman
L Breiman
L Breiman
M van der Laan
MM Ward
MP Cummings
MP Cummings
MR Segal
P Bühlmann
PJ Bickel
R Development Core Team
R Díaz-Uriarte
R Guha
T Hothorn
T Hothorn
TM Therneau
Torsten Hothorn
V Svetnik
X Huang
Y Qi
Y Shih
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. RESULTS: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. CONCLUSION: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Elektronische Publikationen der Wirtschaftsuniversität Wien

A multifactorial analysis of obesity as CVD risk factor: Use of neural network based methods in a nutrigenetics context

Author: A Boutayeb
A Bureau
AA Motsinger
AE Duncan
AG Heidema
AJ Frint
BM Popkin
BV North
CF Sing
D Goldberg
DL McGee
DS Moore
I Arkadianos
I Valavanis
Ioannis K Valavanis
J Arifovic
J Robitaille
J Stevens
J Wakefield
J Xu
JM Ordovas
Keith A Grimaldi
Konstantina S Nikita
L Briollais
LW Hahn
MD Ritchie
MD Ritchie
MR Chernick
N Karnehed
PH Liu
PWF Wilson
R Nakamichi
R Sodjinou
S Canizales-Cuinteros
S Haykin
S Tomida
SG Mougiakakou
SG Mougiakakou
SG Mougiakakou
SM Hermann
SM Williams
Stavroula G Mougiakakou
TA Pearson
W Yu
Y Tomita
YM Cho
Z Wei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Obesity is a multifactorial trait, which comprises an independent risk factor for cardiovascular disease (CVD). The aim of the current work is to study the complex etiology beneath obesity and identify genetic variations and/or factors related to nutrition that contribute to its variability. To this end, a set of more than 2300 white subjects who participated in a nutrigenetics study was used. For each subject a total of 63 factors describing genetic variants related to CVD (24 in total), gender, and nutrition (38 in total), e.g. average daily intake in calories and cholesterol, were measured. Each subject was categorized according to body mass index (BMI) as normal (BMI ≤ 25) or overweight (BMI > 25). Two artificial neural network (ANN) based methods were designed and used towards the analysis of the available data. These corresponded to i) a multi-layer feed-forward ANN combined with a parameter decreasing method (PDM-ANN), and ii) a multi-layer feed-forward ANN trained by a hybrid method (GA-ANN) which combines genetic algorithms and the popular back-propagation training algorithm. Results PDM-ANN and GA-ANN were comparatively assessed in terms of their ability to identify the most important factors among the initial 63 variables describing genetic variations, nutrition and gender, able to classify a subject into one of the BMI related classes: normal and overweight. The methods were designed and evaluated using appropriate training and testing sets provided by 3-fold Cross Validation (3-CV) resampling. Classification accuracy, sensitivity, specificity and area under receiver operating characteristics curve were utilized to evaluate the resulted predictive ANN models. The most parsimonious set of factors was obtained by the GA-ANN method and included gender, six genetic variations and 18 nutrition-related variables. The corresponding predictive model was characterized by a mean accuracy equal of 61.46% in the 3-CV testing sets. Conclusions The ANN based methods revealed factors that interactively contribute to obesity trait and provided predictive models with a promising generalization ability. In general, results showed that ANNs and their hybrids can provide useful tools for the study of complex traits in the context of nutrigenetics.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DSpace at NTUA

Bern Open Repository and Information System (BORIS)

Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information

Author: A Subramanian
AG Heidema
AJ Butte
BK Lin
BT Sherman
DI Chasman
DM Evans
EA Adie
F Bresso
H Mei
HS Chai
J Ward
JH Choi
JM Hancock
John R Garner
JP Ioannidis
Kevin G Becker
KG Becker
KG Becker
KG Becker
KI Goh
Kirstin Smith
M Holden
M Liu
M Slatkin
M Yi
MA Cheh
MB Eisen
MJ Khoury
N Gharani
NR Wray
P Yue
RM Plenge
S Alex Wang
S Ray
SE Harris
SL Zheng
Supriyo De
SY Kim
V Emilsson
VA McKusick
W Huang da
WM Fitch
X Wang
X Wu
Y Guan
YH Lee
Yonqing Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers

Author: A Karatzoglou
A Kernytsky
A Liaw
A Sali
AG Heidema
B Gabrys
C Wong
D Heider
D Heider
D Heider
D Wolpert
Daniel Hoffmann
DK Worthylake
Dominik Heider
F Wilcoxon
H Naderi-Manesh
J Demsar
J Kyte
J Nikolaj Dybowski
J Verheyen
J Zhou
Jens Verheyen
JN Dybowski
K Salzwedel
K van Baelen
KC Chou
KM Ting
L Breiman
L Nanni
L Nanni
LI Kuncheva
M Kierczak
M Pyka
MA Wainberg
Martin Pyka
ML Calle
Mona Riemenschneider
N Beerenwinkel
N Beerenwinkel
N Morellet
N Qian
PW Keller
R Development Core Team
RJ Murray
S Draghici
S Džeroski
S Kawashima
Sascha Hauke
SY Rhee
T Fawcett
T Sing
V Svetnik
W Gronwald
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs. Results We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies. Conclusions Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A genetic ensemble approach for gene-gene interaction identification

Author: A Beygelzimer
A Tsymbal
AA Motsinger
AG Heidema
Albert Y Zomaya
B McKinney
Bing B Zhou
C Greene
D Arking
D Nielsen
D Quigley
D Ruta
D Ruta
D Thomas
D Velez
E Rogaeva
G Bontempi
G Brown
H Cordell
H Cordell
H Zhang
J Cleary
J Hoh
J Kittler
J Moore
JC Barrett
JH Moore
JH Moore
JL Haines
Joshua WK Ho
L Breiman
L Briollais
L Lam
LE Mechanic
LI Kuncheva
LW Hahn
M Kudo
M Ritchie
MR Nelson
P Lucek
Pengyi Yang
R Duerr
R Klein
R Somorjai
S Cantor
S Fisher
S Schmidt
SK Musani
TG Dietterich
X Chen
Y Freund
Y Tomita
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging. Methods In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA) and an ensemble of classifiers (called genetic ensemble). Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP) subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise <it>double fault </it>is designed to quantify the degree of complementarity. Conclusions Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR) and is slightly better than Polymorphism Interaction Analysis (PIA), which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of combining identification results from different algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Survival dimensionality reduction (SDR): development and clinical application of an innovative approach to detect epistasis in presence of right-censored data

Author: A Takaoka
AG Heidema
Alessandro Santaniello
BD Ripley
BZ Alizadeh
C Bansard
C Clavel
CS Greene
DR Cox
E Graf
E Kaplan
EA Stahl
EJ Toonen
FC Arnett
H Ishwaran
H Mitoma
I Rego-Pérez
J Concato
JD Cañete
JD Kalbfleisch
JH Moore
JH Moore
JH Moore
JN Hirschhorn
L De Rycke
LM Leemis
Lorenzo Beretta
LP Kronek
LW Hahn
Marieke JH Coenen
MC Simmonds
MD Ritchie
MG Netea
MJ Coenen
MJ Coenen
ML Prevoo
O Troyanskaya
P Good
P Ranganathan
Piet LCM van Riel
R Bellman
R Culverhouse
R Peto
RA Fisher
Raffaella Scorza
RR Graham
S Pavy
SW Han
TA Thornton-Wells
TL Edwards
W Bateson
W Kievit
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Contains fulltext : 89126.pdf (publisher's version ) (Open Access)BACKGROUND: Epistasis is recognized as a fundamental part of the genetic architecture of individuals. Several computational approaches have been developed to model gene-gene interactions in case-control studies, however, none of them is suitable for time-dependent analysis. Herein we introduce the Survival Dimensionality Reduction (SDR) algorithm, a non-parametric method specifically designed to detect epistasis in lifetime datasets. RESULTS: The algorithm requires neither specification about the underlying survival distribution nor about the underlying interaction model and proved satisfactorily powerful to detect a set of causative genes in synthetic epistatic lifetime datasets with a limited number of samples and high degree of right-censorship (up to 70%). The SDR method was then applied to a series of 386 Dutch patients with active rheumatoid arthritis that were treated with anti-TNF biological agents. Among a set of 39 candidate genes, none of which showed a detectable marginal effect on anti-TNF responses, the SDR algorithm did find that the rs1801274 SNP in the Fc gamma RIIa gene and the rs10954213 SNP in the IRF5 gene non-linearly interact to predict clinical remission after anti-TNF biologicals. CONCLUSIONS: Simulation studies and application in a real-world setting support the capability of the SDR algorithm to model epistatic interactions in candidate-genes studies in presence of right-censored data. Availability: http://sourceforge.net/projects/sdrproject/

Crossref

AIR Universita degli studi di Milano

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Radboud Repository

Neural networks for modeling gene-gene interactions in association studies

Author: A Jakulin
AA Motsinger
AA Motsinger
AA Motsinger-Reif
AA Motsinger-Reif
AA Motsinger-Reif
AA Motsinger-Reif
AD Flouris
AG Heidema
B North
BA McKinney
CM Bishop
Frauke Günther
G Schwarz
H Akaike
HJ Cordell
I Ruczinski
J Liu
J Millstein
J Ott
JH Moore
JH Moore
JR Koza
K Bammann
K Broberg
Karin Bammann
L Breiman
L Briollais
LW Hahn
M Riedmiller
MB Lanktree
MD Ritchie
MD Ritchie
ME Sáez
MJ Wade
MR Nelson
N Risch
Nina Wawro
NR Cook
P McCullagh
PR Lucek
R Development Core Team
R Foraita
R Hecht-Nielsen
R Tibshirani
RL Milne
S Fritsch
SH Chen
SK Musani
W Branicki
W Li
WS Bush
X Tang
Y Amit
Y Qi
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied. Results The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction. Conclusions Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central