Search CORE

3 research outputs found

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors

Author: Alex Ishkin
B Efron
Brandon D Gallas
C Fan
C Lai
C Liedtke
Christos Hatzis
CM Perou
Daniel Booser
DW Huang
F Andre
F Peintinger
Frank W Samuelson
Gabriel N Hortobagyi
IA Wood
J Stec
JS Ross
K Fukunaga
Kenneth R Hess
KR Hess
L Ein-Dor
L Pusztai
Lajos Pusztai
Leming Shi
M Ayers
M Lecocke
M Zucknick
Marina Tsyganova
Mauro Delorenzi
MJ van de Vijver
P Wirapati
PC Boutros
R Rouzier
S Dudoit
S Paik
Tatiana Nikolskaya
TK Ho
Vicente Valero
Vlad Popovici
W Fraser Symmans
WA Yousef
WA Yousef
Weijie Chen
Weiwei Shi
WF Symmans
Yuri Nikolsky
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Serveur académique lausannois

PubMed Central

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models

Author: Arasappan Dhivya
Bao Wenjun
Barlogie Bart
Berthold Frank
Bitter Hans
Brennan Richard J.
Brors Benedikt
Buness Andreas
Bushel Pierre R.
Bylesjo Max
Campagne Fabien
Campbell Gregory
Catalano Jennifer G.
Chang Chang
Chen Minjun
Chen Rong
Chen Weijie
Cheng Jie
Cheng Jing
Cheng Yiyu
Chou Jeff
Chu Tzu-Ming
Cui Jian
Czika Wendy
Davison Timothy S.
Delorenzi Mauro
Demichelis Francesca
Deng Xutao
Deng Youping
Devanarayan Viswanath
Dix David J.
Dopazo Joaquin
Dorff Kevin C.
Dosymbekov Damir
Du Pan
Eils Roland
Elloumi Fathi
Fan Jianqing
Fan Shicai
Fan Xiaohui
Fang Hong
Feng Yang
Fielden Mark
Fischer Matthias
Fostel Jennifer
Fulmer-Smentek Stephanie
Furlanello Cesare
Fuscoe James C.
Gallas Brandon D.
Gatto Laurent
Ge Weigong
Ge Xijin
Goldstein Darlene R.
Gonzaludo Nina
Goodsaid Federico M.
Guo Li
Halbert Donald N.
Han Jing
Harris Stephen C.
Hatzis Christos
Herman Damir
Hess Kenneth R.
Hong Huixiao
Huan Jun
Huang Jianping
Irizarry Rafael A.
Jensen Roderick V.
Jiang Rui
Johnson Charles D.
Jones Wendell D.
Judson Richard
Juraeva Dilafruz
Jurman Giuseppe
Kahlert Yvonne
Khuder Sadik A.
Kohl Matthias
Lababidi Samir
Lambert Christophe G.
Li Jianying
Li Li
Li Li
Li Menglong
Li Quan-Zhen
Li Shao
Li Yanen
Li Zhen
Li Zhiguang
Lin Simon M.
Liu Guozhen
Liu Jie
Liu Ying
Liu Zhichao
Lobenhofer Edward K.
Lucas Anne Bergstrom
Luo Jun
Luo Wen
Madera Manuel
MAQC Consortium
Martinez-Murillo Francisco
McCall Matthew N.
Medina Ignacio
Meehan Joseph
Megherbi Dalila B.
Meng Lu
Miclaus Kelci
Moffitt Richard A.
Montaner David
Mukherjee Piali
Mulligan George J.
Neville Padraic
Nikolskaya Tatiana
Nikolsky Yuri
Ning Baitang
Oberthuer Andre
Page Grier P.
Parker Joel
Parry R. Mitchell
Paules Richard S.
Peng Xuejun
Pennello Gene A.
Perkins Roger G.
Peterson Ron L.
Phan John H.
Philip Reena
Popovici Vlad
Price Nathan D.
Puri Raj K.
Pusztai Lajos
Qian Feng
Quanz Brian
Ren Yi
Riccadonna Samantha
Roter Alan H.
Samuelson Frank W.
Scherer Andreas
Scherf Uwe
Schumacher Martin M.
Shambaugh Joseph D.
Shaughnessy John D., Jr.
Shi Leming
Shi LM
Shi Qiang
Shi Tieliu
Shi Weiwei
Shippy Richard
Si Shengzhu
Smalter Aaron
Sotiriou Christos
Soukup Mat
Staedtler Frank
Steiner Guido
Stokes Todd H.
Su Zhenqiang
Sun Qinglan
Sung Jaeyun
Symmans W. Fraser
Tan Pei-Yi
Tang Rong
Tezak Zivana
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thodima Venkata
Thomas Russell S.
Thorn Brett
Tong Weida
Trygg Johan
Tsyganova Marina
Turpaz Yaron
Vega Silvia C.
Vishnuvajjala Lakshmi
Visintainer Roberto
von Frese Juergen
Walker Stephen J.
Wang Charles
Wang Eric
Wang Junwei
Wang May D.
Wang Sue Jane
Wang Wei
Wen Zhining
Westermann Frank
Willey James C.
Wolfinger Russell D.
Woods Matthew
Wu Jianping
Wu Shujian
Wu Yichao
Xiao Nianqing
Xie Qian
Xu Joshua
Xu Lei
Yang Lun
Yousef Waleed A.
Zeng Xiao
Zhang Jialu
Zhang John
Zhang Li
Zhang Liang
Zhang Min
Zhang Xuegong
Zhao Chen
Zhong Sheng
Zhou Yiming
Zhu Sheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis. © 2010 Nature America, Inc. All rights reserved.0SCOPUS: ar.jinfo:eu-repo/semantics/publishe

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Serveur académique lausannois

Institutional Repository of Institute of Process Engineering, CAS (IPE-IR）

DI-fusion

Hochschulschriftenserver der Hochschule Furtwangen