Search CORE

Spiral - Imperial College Digital Repository

University of Thessaly Institutional Repository

Nearest Template Prediction: A Single-Sample-Based Flexible Class Prediction with Confidence Assessment

Author: A Dupuy
A Subramanian
AI Su
B Weigelt
C Desmedt
E Bair
E Wurmbach
EE Ntzani
J Fan
J Lamb
J Pittman
K Stegmaier
L Xu
LJ van 't Veer
LJ van't Veer
M Reich
M West
M Zervakis
MJ van de Vijver
Patrick Tan
S Michiels
TR Golub
Y Benjamini
Y Hoshida
Y Hoshida
Y Hoshida
Y Wang
Yujin Hoshida
Publication venue: Public Library of Science
Publication date
Field of study

Gene-expression signature-based disease classification and clinical outcome prediction has not been widely introduced in clinical medicine as initially expected, mainly due to the lack of extensive validation needed for its clinical deployment. Obstacles include variable measurement in microarray assay, inconsistent assay platform, analytical requirement for comparable pair of training and test datasets, etc. Furthermore, as medical device helping clinical decision making, the prediction needs to be made for each single patient with a measure of its reliability. To address these issues, there is a need for flexible prediction method less sensitive to difference in experimental and analytical conditions, applicable to each single patient, and providing measure of prediction confidence. The nearest template prediction (NTP) method provides a convenient way to make class prediction with assessment of prediction confidence computed in each single patient's gene-expression data using only a list of signature genes and a test dataset. We demonstrate that the method can be flexibly applied to cross-platform, cross-species, and multiclass predictions without any optimization of analysis parameters

The C1C2: A framework for simultaneous model selection and assessment

Author: A Golbraikh
A Kontijevskis
AE Hoerl
B Efron
B Efron
C Hansch
D Wolpert
DE Goldberg
DL Selwood
E Amaldi
E Freyhult
EE Ntzani
G Schwarz
H Akaike
H Kubinyi
H Shimodaira
J Cartmell
J Cartmell
J Kuha
J Shao
Jarl ES Wikberg
JES Wikberg
L Wasserman
LJ van't Veer
M Skurichina
M Stone
Martin Eklund
O Nicolotti
O Obrezanova
O Spjuth
Ola Spjuth
P Burman
R Tibshirani
R Todeschini
RE Kass
S Michiels
SJ Cho
SR Johnson
T Hastie
TR Hvidsten
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Springer - Publisher Connector

Sharing Detailed Research Data Is Associated with Increased Citation Rate

Author: A Brazma
A Brazma
A Swan
A Theologis
AM Diamond Jr.
AR Weale
B Ventura
C Santos
CA Ball
CA Ball
D Gupta
DO Case
Douglas B. Fridsma
DR Rhodes
E Check
E Evangelou
EE Ntzani
G Eysenbach
G Sherlock
H Parkinson
Heather A. Piwowar
John Ioannidis
K Antelman
K Ikeo
KH Buetow
KW McCain
LA Liotta
LM McShane
NA Patsopoulos
PA Kyzas
PO Seglen
PT Spellman
R Edgar
RD Riley
Roger S. Day
S Popat
SE Fienberg
T Cech
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

BACKGROUND: Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. PRINCIPAL FINDINGS: We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. SIGNIFICANCE: This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data

CiteSeerX

Public Library of Science (PLOS)

D-Scholarship@Pitt

A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures

Author: A Dupuy
A Rosenwald
A Sanchez-Palencia
A Zien
AH Bild
AL Boulesteix
AP Crijns
Bradly G. Wouters
C Fan
CM Bryant
CQ Zhu
EE Ntzani
Glenn Fung
Harald Steck
HY Chang
HY Chang
I Fishel
J Subramanian
JT Chi
L Ein-Dor
LD Miller
LD Miller
M Jeanmougin
Maud H. W. Starmans
MJ van de Vijver
PC Boutros
Philippe Lambin
R Liu
R Seigneuric
S Michiels
SK Lau
V Popovici
WW Xu
X Fan
Xin Wei Wang
Y Wang
Publication venue: Public Library of Science
Publication date: 07/12/2011
Field of study

BACKGROUND: Highly parallel analysis of gene expression has recently been used to identify gene sets or 'signatures' to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures. PRINCIPAL FINDINGS: A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ∼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number. CONCLUSIONS: We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited

Maastricht University Research Portal

Public Library of Science (PLOS)

Mine, Yours, Ours? Sharing Data on Human Genetic Variation

Author: A Carracedo
AA Alsheikh-Ali
Alessandra Congiu
B Nelson
BA Fischer
C Murdoch
C Norman
C Tenopir
CJ Savage
D Mishmar
E Marshall
E Pennisi
EE Ntzani
EG Campbell
Emanuele Sanna
EW Sayers
Francesco Montinaro
G Destro-Bisol
G King
Giovanni Destro Bisol
HA Piwowar
HA Piwowar
HA Piwowar
HA Piwowar
HA Piwowar
J Giffels
JM Wicherts
L Peltonen
MAF Noor
Marco Capocasa
MR Nelson
Nicola Milia
Paolo Anagnostou
RK Merton
Roscoe Stanyon
S Hoban
S Tofanelli
S Willuweit
T Lang
TE King
W Parson
W Parson
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The achievement of a robust, effective and responsible form of data sharing is currently regarded as a priority for biological and bio-medical research. Empirical evaluations of data sharing may be regarded as an indispensable first step in the identification of critical aspects and the development of strategies aimed at increasing availability of research data for the scientific community as a whole. Research concerning human genetic variation represents a potential forerunner in the establishment of widespread sharing of primary datasets. However, no specific analysis has been conducted to date in order to ascertain whether the sharing of primary datasets is common-practice in this research field. To this aim, we analyzed a total of 543 mitochondrial and Y chromosomal datasets reported in 508 papers indexed in the Pubmed database from 2008 to 2011. A substantial portion of datasets (21.9%) was found to have been withheld, while neither strong editorial policies nor high impact factor proved to be effective in increasing the sharing rate beyond the current figure of 80.5%. Disaggregating datasets for research fields, we could observe a substantially lower sharing in medical than evolutionary and forensic genetics, more evident for whole mtDNA sequences (15.0% vs 99.6%). The low rate of positive responses to e-mail requests sent to corresponding authors of withheld datasets (28.6%) suggests that sharing should be regarded as a prerequisite for final paper acceptance, while making authors deposit their results in open online databases which provide data quality control seems to provide the best-practice standard. Finally, we estimated that 29.8% to 32.9% of total resources are used to generate withheld datasets, implying that an important portion of research funding does not produce shared knowledge. By making the scientific community and the public aware of this important aspect, we may help popularize a more effective culture of data sharing

Archivio istituzionale della ricerca - Università di Bari

Archivio istituzionale della ricerca - Università di Cagliari

Archivio della ricerca- Università di Roma La Sapienza

FigShare

Expanding the Understanding of Biases in Development of Clinical-Grade Molecular Signatures: A Case Study in Acute Respiratory Viral Infections

Author: A Rangarajan
A Statnikov
A Statnikov
A Statnikov
A Statnikov
A Statnikov
AK Zaas
AK Zaas
Alexander Statnikov
AM Glas
C Ambroise
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
EE Ntzani
ER DeLong
F Azuaje
FJ Gonzalez
GG Jackson
I Guyon
I Guyon
I Tsamardinos
J Pearl
J Pearl
JA Sparano
JT Leek
Jörn-Hendrik Weitkamp
KA Baggerly
Lauren McVoy
LM Cope
Nikita I. Lytkin
O Ramilo
R Kohavi
R Simon
RA Irizarry
RA Irizarry
RL Somorjai
TW Anderson
UM Braga-Neto
Vladimir Brusic
VN Vapnik
WE Johnson
Y Benjamini
Y Benjamini
Z Liu
Publication venue: Public Library of Science
Publication date: 01/06/2011
Field of study

The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution

Public Library of Science (PLOS)

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

Author: A Statnikov
AB Okey
AF Fliri
AF Fliri
AM Molinaro
C Helma
C Sima
CP Austin
D Krewski
DJ Dix
E Walum
EE Ntzani
Fathi Elloumi
GM Williams
GV Paolini
H Almuallim
H Toivonen
H Wang
Imran Shah
J Inglese
J Klekota
J Lamb
J Zhang
JPVanden Heuvel
JS Melnick
K Tietjen
LB Moore
LH Li
M Bredel
M McMillian
MT Martin
N Ancona
N Bhogal
N Japkowicz
P Baldi
P Pudil
PJ O'Brien
R Benigni
R Burbridge
R Kikkawa
R Kohavi
R Woodrow Setzer
Richard Judson
RL Strausberg
SC Smith
SG Baker
U Scherf
Y Sun
Z Lepp
Zhen Li
Publication venue: BioMed Central
Publication date: 01/05/2008
Field of study

Abstract Background Bioactivity profiling using high-throughput <it>in vitro </it>assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex <it>in vitro/in vivo </it>datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. Results The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated <it>in vitro </it>assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.</p

Springer - Publisher Connector

Carolina Digital Repository

Design of an allele-specific PCR assay to genotype the rs12255372 SNP in a pilot study of association between common TCF7L2 polymorphisms and type 2 diabetes in Venezuelans

Author: Adeghate E
Assmann TS
Barra GB
Barros CM
Bodhini D
Campbell DD
Cauchi S
Chang YC
Cruz M
Danquah I
Dutra LAS
Duval A
Franco LF
Furgeri DT
Gamboa-Meléndez MA
Gaulton KJ
Grant SF
Groves CJ
Guinan KJ
Hayashi T
Imamura M
Lange EM
Lehman DM
Lozano R
Lyssenko V
Marquezine GF
Martínez-Gómez LE
Ntzani EE
Parra EJ
Peng S
Sepúlveda J
Shu L
Shu L
Sladek R
Stumvoll M
Tong Y
Wagner R
Wang J
Wang J
Yi F
Zhang C
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Genome-wide gene expression profiling suggests distinct radiation susceptibilities in sporadic and post-Chernobyl papillary thyroid cancers

Papillary thyroid cancers (PTCs) incidence dramatically increased in the vicinity of Chernobyl. The cancer-initiating role of radiation elsewhere is debated. Therefore, we searched for a signature distinguishing radio-induced from sporadic cancers. Using microarrays, we compared the expression profiles of PTCs from the Chernobyl Tissue Bank (CTB, n=12) and from French patients with no history of exposure to ionising radiations (n=14). We also compared the transcriptional responses of human lymphocytes to the presumed aetiological agents initiating these tumours, γ-radiation and H2O2. On a global scale, the transcriptomes of CTB and French tumours are indistinguishable, and the transcriptional responses to γ-radiation and H2O2 are similar. On a finer scale, a 118 genes signature discriminated the γ-radiation and H2O2 responses. This signature could be used to classify the tumours as CTB or French with an error of 15–27%. Similar results were obtained with an independent signature of 13 genes involved in homologous recombination. Although sporadic and radio-induced PTCs represent the same disease, they are distinguishable with molecular signatures reflecting specific responses to γ-radiation and H2O2. These signatures in PTCs could reflect the susceptibility profiles of the patients, suggesting the feasibility of a radiation susceptibility test