Search CORE

197 research outputs found

Stability analysis of mixtures of mutagenetic trees

Author: A Cayley
AP Dempster
B Efron
BA Larder
BA Larder
CA Boucher
H Prüfer
HW Kuhn
J Edmonds
J Rahnenführer
J Yin
Jasmina Bogojeska
Jörg Rahnenführer
N Beerenwinkel
R Desper
S Rhee
T Hastie
Thomas Lengauer
TV Allen
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned. Results In a simulation study, we analyzed various aspects of the stability of estimated mutagenetic trees mixture models. It turned out that the induced probabilistic distributions and the tree topologies are recovered with high precision by an EM-like learning algorithm. However, only for models with just one major model component, also GPS values of single patients can be reliably estimated. Conclusion It is encouraging that the estimation process of mutagenetic trees mixture models can be performed with high confidence regarding induced probability distributions and the general shape of the tree topologies. For a model with only one major disease progression process, even genetic progression scores for single patients can be reliably estimated. However, for models with more than one relevant component, alternative measures should be introduced for estimating the stage of disease progression.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models

Author: BA Reeder
BA Reeder
EF Cook
F Paccaud
Fred Paccaud
Insightful Corp
JA Houmard
JA Sonquist
JC Seidell
JM Chambers
L Breiman
LA Clark
MC Pouliot
Michael C Costanza
N Terrin
P Bjorntorp
RA Fisher
SAS Institute Inc
SE Bleeker
SM Shetterly
TJ Hastie
TS Han
V Wietlisbach
WN Venables
World Health Organization MONICA Project Principal Investigators
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables

Crossref

Springer - Publisher Connector

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central

Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data

Author: B Efron
BA Goldstein
BE Madsen
C Robert
Gengxin Li
H Zhong
Hongyu Zhao
Jia Kang
John Ferguson
Joon Sang Lee
L Almasy
L Breiman
Lun Li
R Diaz-Uriarte
R Tibshirani
T Hastie
Wei Zheng
Xianghua Zhang
Xiting Yan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We consider the application of Efron’s empirical Bayes classification method to risk prediction in a genome-wide association study using the Genetic Analysis Workshop 17 (GAW17) data. A major advantage of using this method is that the effect size distribution for the set of possible features is empirically estimated and that all subsequent parameter estimation and risk prediction is guided by this distribution. Here, we generalize Efron’s method to allow for some of the peculiarities of the GAW17 data. In particular, we introduce two ways to extend Efron’s model: a weighted empirical Bayes model and a joint covariance model that allows the model to properly incorporate the annotation information of single-nucleotide polymorphisms (SNPs). In the course of our analysis, we examine several aspects of the possible simulation model, including the identity of the most important genes, the differing effects of synonymous and nonsynonymous SNPs, and the relative roles of covariates and genes in conferring disease risk. Finally, we compare the three methods to each other and to other classifiers (random forest and neural network)

Crossref

Springer - Publisher Connector

PubMed Central

A classification model to predict synergism/antagonism of cytotoxic mixtures using protein-drug docking scores

Author: A Goldin
A Hoskuldsson
AL Boulesteix
BA Carlson
CD Lao
E Perola
F Abas
FM Muggia
G Patlakas
GL Warren
GR Zimmermann
J Lehar
JC Boik
JC Boik
JM Nabholtz
John C Boik
M Momma
M Tabata
N Akula
Robert A Newman
RP Araujo
RP Sheridan
T Hastie
T Safra
TG Dietterich
VT DeVita Jr.
Y Hayashi
Z Zsoldos
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Threat-sensitive anti-predator defence in precocial wader, the northern lapwing Vanellus vanellus

Author: A Amar
A Berg
A Liker
AG Paassen van
AM Wilson
B Hönisch
BA Crawford
BG Palestis
C Listøen
CK Ghalambor
D Baines
D Clode
D Holyoak
D Ratcliffe
DH Brunton
DMB Parish
DP Barash
E Rhoades
G Orłowski
G Orłowski
GB Grønstøl
GE Brown
GJ Fernández
GL Maclean
GS Helfman
H Akaike
H Galbraith
H Mayfield
H Schekkerman
I Byrkjedal
I Krams
I Krams
I Newton
J Gromadzka
J Kis
J Klicka
JA Vickery
Jakub Szymkowiak
JE Jónsson
JM Luginbuhl
JR Krebs
JR Walters
JR Walters
K Fletcher
KJ Mathot
KL Evans
L Sandoval
Lechosław Kuczyński
LY Zanette
M Andersson
M Bolton
M Marquiss
M Strnad
M Weggler
M Šálek
M Šálek
MA MacDonald
MCO Ferrari
NA Schneider
NA Schneider
Natalia Królikowska
OC Johansson
P Hendricks
P Kontiainen
PF Donald
PJ Ewins
R Development Core Team
R Montgomerie
R Parr
RA Laidlaw
RD Elliot
RE Green
RE Green
RE Ricklefs
Rebecca Anne Laidlaw
RL Knight
RM Coleman
RM Whittam
S Cramp
S Dale
S Eggers
S Rytkönen
S Wood
SC Stearns
SL Lima
SL Lima
T Caro
T Krama
T Krama
T Larsen
TA Sordahl
TA Sordahl
TE Martin
TH Clutton-Brock
TJ Hastie
V Ruusila
W Teunissen
Ǻ Berg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Birds exhibit various forms of anti-predator behaviours to avoid reproductive failure, with mobbing—observation, approach and usually harassment of a predator—being one of the most commonly observed. Here, we investigate patterns of temporal variation in the mobbing response exhibited by a precocial species, the northern lapwing (Vanellus vanellus). We test whether brood age and self-reliance, or the perceived risk posed by various predators, affect mobbing response of lapwings. We quantified aggressive interactions between lapwings and their natural avian predators and used generalized additive models to test how timing and predator species identity are related to the mobbing response of lapwings. Lapwings diversified mobbing response within the breeding season and depending on predator species. Raven Corvus corax, hooded crow Corvus cornix and harriers evoked the strongest response, while common buzzard Buteo buteo, white stork Ciconia ciconia, black-headed gull Chroicocephalus ridibundus and rook Corvus frugilegus were less frequently attacked. Lapwings increased their mobbing response against raven, common buzzard, white stork and rook throughout the breeding season, while defence against hooded crow, harriers and black-headed gull did not exhibit clear temporal patterns. Mobbing behaviour of lapwings apparently constitutes a flexible anti-predator strategy. The anti-predator response depends on predator species, which may suggest that lapwings distinguish between predator types and match mobbing response to the perceived hazard at different stages of the breeding cycle. We conclude that a single species may exhibit various patterns of temporal variation in anti-predator defence, which may correspond with various hypotheses derived from parental investment theory

Crossref

Springer - Publisher Connector

PubMed Central

University of East Anglia digital repository

Genomic prediction in CIMMYT maize and wheat breeding programs

Author: BA McKinney
BJ Hayes
C Riedelsheimer
D Bonnett
D Gianola
D Gianola
D Habier
D Wang
D Wang
EL Heffner
G de los Campos
G de los Campos
HD Daetwyler
J Burgueño
J Burgueño
J Burgueño
J Burgueño
J Cerón-Rojas
J Crossa
J Crossa
J Crossa
J Hickey
JM Gonzalez-Camacho
JM Hickey
K Mathews
L Ornella
L Ornella
ME Goddard
P Pérez
P Pérez
P Pérez
PM VanRaden
PM VanRaden
PM VanRaden
R Babu
R Bernardo
R Bernardo
RE Lorenzana
S Dreisigacker
T Hastie
T Park
THE Meuwissen
VS Windhausen
X Zhang
Y Li
Y Zhao
YM Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2013
Field of study

Genomic selection (GS) has been implemented in animal and plant species, and is regarded as a useful tool for accelerating genetic gains. Varying levels of genomic prediction accuracy have been obtained in plants, depending on the prediction problem assessed and on several other factors, such as trait heritability, the relationship between the individuals to be predicted and those used to train the models for prediction, number of markers, sample size and genotype × environment interaction (GE). The main objective of this article is to describe the results of genomic prediction in International Maize and Wheat Improvement Center's (CIMMYT's) maize and wheat breeding programs, from the initial assessment of the predictive ability of different models using pedigree and marker information to the present, when methods for implementing GS in practical global maize and wheat breeding programs are being studied and investigated. Results show that pedigree (population structure) accounts for a sizeable proportion of the prediction accuracy when a global population is the prediction problem to be assessed. However, when the prediction uses unrelated populations to train the prediction equations, prediction accuracy becomes negligible. When genomic prediction includes modeling GE, an increase in prediction accuracy can be achieved by borrowing information from correlated environments. Several questions on how to incorporate GS into CIMMYT's maize and wheat programs remain unanswered and subject to further investigation, for example, prediction within and between related bi-parental crosses. Further research on the quantification of breeding value components for GS in plant breeding populations is required.J Crossa, P Pérez, J Hickey, J Burgueño, L Ornella, J Cerón-Rojas, X Zhang, S Dreisigacker, R Babu, Y Li, D Bonnett and K Mathew

Research UNE

Crossref

Adelaide Research & Scholarship

PubMed Central

Edinburgh Research Explorer

Research Online

Grammatical evolution decision trees for detecting gene-gene interactions

Author: AA Motsinger
AA Motsinger-Reif
AA Motsinger-Reif
Alison A Motsinger-Reif
BA Shepherd
BLG Miller
CS Greene
D Altshuler
DB Goldstein
DR Velez
E Alpaydin
E Cantu-Paz
HJ Cordell
IH Witten
J Koza
J Koza
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
JR Quinlan
JS Aguilar-Ruiz
L Brieman
LGL Devroy
M Hall
M O'Neill
M O'Neill
MD Ritchie
MR Nelson
Nicholas E Hardison
R Bellman
R Culverhouse
RJ Neuman
SM Dudek
Stacey J Winham
Sushamna Deodhar
TJ Hastie
W Li
X Yao
Publication venue: BioMed Central
Publication date: 01/11/2010
Field of study

Abstract Background A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p

Crossref

Directory of Open Access Journals

PubMed Central

Determining Frequent Patterns of Copy Number Alterations in Cancer

Author: A Li
AB Olshen
B Efron
B Zhao
BA Weir
BS Taylor
C Franchini
Christina Leslie
CL Andersen
D Pinkel
DV Spencer
E Blaveri
E Michels
ES Venkatraman
F Bach
F Picard
F Rapaport
Franck Rapaport
G Nakajima
HF Mark
I Magnani
J Shlens
J Sturm
J Trolet
JC Marioni
Jean Peccoud
JR Pollack
K Holmstrom
K Zhang
KH Baek
L Xu
M Speicher
OM Rueda
P Neuvial
R Beroukhim
R Tibshirani
R Tibshirani
R Wiedemeyer
RC O'Hagan
SF Chin
SF Chin
T Hastie
T Nagasaka
VG Tusher
WR Lai
X Mao
X Zhao
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Cancer progression is often driven by an accumulation of genetic changes but also accompanied by increasing genomic instability. These processes lead to a complicated landscape of copy number alterations (CNAs) within individual tumors and great diversity across tumor samples. High resolution array-based comparative genomic hybridization (aCGH) is being used to profile CNAs of ever larger tumor collections, and better computational methods for processing these data sets and identifying potential driver CNAs are needed. Typical studies of aCGH data sets take a pipeline approach, starting with segmentation of profiles, calls of gains and losses, and finally determination of frequent CNAs across samples. A drawback of pipelines is that choices at each step may produce different results, and biases are propagated forward. We present a mathematically robust new method that exploits probe-level correlations in aCGH data to discover subsets of samples that display common CNAs. Our algorithm is related to recent work on maximum-margin clustering. It does not require pre-segmentation of the data and also provides grouping of recurrent CNAs into clusters. We tested our approach on a large cohort of glioblastoma aCGH samples from The Cancer Genome Atlas and recovered almost all CNAs reported in the initial study. We also found additional significant CNAs missed by the original analysis but supported by earlier studies, and we identified significant correlations between CNAs

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations

Author: AD Sevin
B Efron
B Winters
B Winters
BA Larder
Barbara Van Kerckhove
C van Rijsbergen
CF Perno
Elke Van Craenenbroeck
G Picchio
G Schwarz
Geert Verbeke
GN Nikolenko
H Akaike
H Vermeiren
Herman van Vlijmen
J Shao
J Vingerhoets
J Vingerhoets
JM Whitcomb
K Hertogs
K Steegen
K Wang
Koen Van der Borght
L Breiman
L Tambuyzer
Lee Bacheler
M Van Houtte
Margriet Van Houtte
MH Kutner
P Austin
P Burman
P Zhang
P Zhang
Pierre Lecocq
R Paredes
RA Cohen
RW Shafer
SA Clark
SY Rhee
T Hastie
VW Byrnes
Y Verlinden
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Linear regression models are used to quantitatively predict drug resistance, the phenotype, from the HIV-1 viral genotype. As new antiretroviral drugs become available, new resistance pathways emerge and the number of resistance associated mutations continues to increase. To accurately identify which drug options are left, the main goal of the modeling has been to maximize predictivity and not interpretability. However, we originally selected linear regression as the preferred method for its transparency as opposed to other techniques such as neural networks. Here, we apply a method to lower the complexity of these phenotype prediction models using a 3-fold cross-validated selection of mutations. Results Compared to standard stepwise regression we were able to reduce the number of mutations in the reverse transcriptase (RT) inhibitor models as well as the number of interaction terms accounting for synergistic and antagonistic effects. This reduction in complexity was most significant for the non-nucleoside reverse transcriptase inhibitor (NNRTI) models, while maintaining prediction accuracy and retaining virtually all known resistance associated mutations as first order terms in the models. Furthermore, for etravirine (ETR) a better performance was seen on two years of unseen data. By analyzing the phenotype prediction models we identified a list of forty novel NNRTI mutations, putatively associated with resistance. The resistance association of novel variants at known NNRTI resistance positions: 100, 101, 181, 190, 221 and of mutations at positions not previously linked with NNRTI resistance: 102, 139, 219, 241, 376 and 382 was confirmed by phenotyping site-directed mutants. Conclusions We successfully identified and validated novel NNRTI resistance associated mutations by developing parsimonious resistance prediction models in which repeated cross-validation within the stepwise regression was applied. Our model selection technique is computationally feasible for large data sets and provides an approach to the continued identification of resistance-causing mutations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting global invasion risks: a management tool to prevent future introductions

Author: A Guisan
A Guisan
A Jiménez-Valverde
A Simon
A Simon
AA Makhrov
AC Pinder
AC Pinder
AM Prasad
B Leung
BA Bradley
CJ Vörösmarty
D Andreou
D Lin
D Pimentel
D Savini
FG Ekmekçi
GD Davies
GG Moisen
GH Copp
J Robert Britton
JH Myers
JR Britton
JR Britton
KD Arkush
KD Arkush
KP Mainali
L Vilizzi
M Keith
MC Fitzpatrick
MJ Vander Zanden
MS Wisz
P Pyšek
PC Pheloung
PE Hulme
R Ihaka
RA Boria
RE Gozlan
RE Gozlan
RE Gozlan
RE Gozlan
RE Gozlan
RI Colautti
RJ Hijmans
S Villéger
SH Fernald
T Hastie
TA Crowl
TJ Stohlgren
TomP Moorhouse
V Semenchenko
W Thuiller
W Thuiller
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Predicting regions at risk from introductions of non-native species and the subsequent invasions is a fundamental aspect of horizon scanning activities that enable the development of more effective preventative actions and planning of management measures. The Asian cyprinid fish topmouth gudgeon Pseudorasbora parva has proved highly invasive across Europe since its introduction in the 1960s. In addition to direct negative impacts on native fish populations, P. parva has potential for further damage through transmission of an emergent infectious disease, known to cause mortality in other species. To quantify its invasion risk, in regions where it has yet to be introduced, we trained 900 ecological niche models and constructed an Ensemble Model predicting suitability, then integrated a proxy for introduction likelihood. This revealed high potential for P. parva to invade regions well beyond its current invasive range. These included areas in all modelled continents, with several hotspots of climatic suitability and risk of introduction. We believe that these methods are easily adapted for a variety of other invasive species and that such risk maps could be used by policy-makers and managers in hotspots to formulate increased surveillance and early-warning systems that aim to prevent introductions and subsequent invasions

HAL - Normandie Université

Bournemouth University Research Online

Red de Bibliotecas Virtuales de Ciencias Sociales de América Latina y El Caribe

Horizon / Pleins textes