Search CORE

91 research outputs found

Automated segmentation of tissue images for computerized IHC analysis

Author: A. Acquaviva
Borad
Boykov
Brey
Cheng
Cheng
Cregger
Cualing
Di Cataldo
Divito
E. Ficarra
E. Macii
Ficarra
Ficarra
Fuchs
Gonzalez
Gudla
Huang
Jacob
Jain
Kim
Lacroix-Triki
Landini
Long
Lopez
Luck
Markiewicz
Masmoudi
Matula
Mukherjee
Naik
Pinidiyaarachchi
Ruifrok
Ruifrok
S. Di Cataldo
Statnikov
Taneja
Theodosiou
Twellmann
Wang
Wolff
Yang
Zehntner
Zhang
Publication venue: Elsevier
Publication date: 01/01/2010
Field of study

This paper presents two automated methods for the segmentation ofimmunohistochemical tissue images that overcome the limitations of themanual approach aswell as of the existing computerized techniques. The first independent method, based on unsupervised color clustering, recognizes automatically the target cancerous areas in the specimen and disregards the stroma; the second method, based on colors separation and morphological processing, exploits automated segmentation of the nuclear membranes of the cancerous cells. Extensive experimental results on real tissue images demonstrate the accuracy of our techniques compared to manual segmentations; additional experiments show that our techniques are more effective in immunohistochemical images than popular approaches based on supervised learning or active contours. The proposed procedure can be exploited for any applications that require tissues and cells exploration and to perform reliable and standardized measures of the activity of specific proteins involved in multi-factorial genetic pathologie

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

PORTO Publications Open Repository TOrino

Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

Author: A Bhattacharjee
A Butte
A Dupuy
A Potti
A Rosenwald
A Statnikov
A Statnikov
A Statnikov
Alexander Statnikov
AM Glas
B Freidlin
Bryan E. Shepherd
CF Aliferis
Constantin F. Aliferis
CX Ling
DG Beer
DJ Hand
EJ Yeoh
EL Lehmann
FE Harrell Jr
Frank E. Harrell
G Casella
Ioannis Tsamardinos
JA Sparano
Jonathan S. Schildcrout
JP Ioannidis
KK Dobbin
KK Dobbin
L Ein-Dor
L Shi
LA Habel
LJ van't Veer
M Saerens
MD Radmacher
ME Burczynski
MJ Marton
ML Lee
N Iizuka
P Baldi
PI Good
R Kohavi
R Simon
RE Fan
S Michiels
S Mukherjee
S Paik
S Paik
S Ramaswamy
SL Pomeroy
T Bammler
T Hastie
TR Golub
TS Furey
UM Braga-Neto
Vladimir B. Bajic
VN Vapnik
W Jiang
Publication venue: Public Library of Science
Publication date: 17/03/2009
Field of study

Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development.We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data.THE FINDINGS OF THE PRESENT STUDY HAVE TWO IMPORTANT PRACTICAL IMPLICATIONS: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Predictive integration of gene functional similarity and co-expression defines treatment response of endothelial progenitor cells

Author: A Alexeyenko
A Ceol
A Dasgupta
A Peled
A Siddique
A Statnikov
B Aranda
B Turner
BJ Oh
C Pesquita
C Urbich
D Barrell
D Lin
D Szklarczyk
D Warde-Farley
Daniel R Wagner
DC Kirouac
DH Walter
DW Huang
E Frank
E Novikov
E Schutyser
EC Keeley
F Azuaje
F Azuaje
F Azuaje
F Browne
Francisco J Azuaje
Frédérique Léonard
H Wang
H Wang
Haiying Wang
Huiru Zheng
HY Chuang
IW Taylor
J Chen
J De Sutter
J Hur
L Salwinski
L Salwinski
L Statnikov
Lu Zhang
M Gnecchi
Magali Rolland-Turner
MC Montesinos
N Bolshakova
P Shannon
PC Boutros
RJ Medina
S Rafii
S Ryzhov
VG Tusher
Y Chen
Yvan Devaux
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Endothelial progenitor cells (EPCs) have been implicated in different processes crucial to vasculature repair, which may offer the basis for new therapeutic strategies in cardiovascular disease. Despite advances facilitated by functional genomics, there is a lack of systems-level understanding of treatment response mechanisms of EPCs. In this research we aimed to characterize the EPCs response to adenosine (Ado), a cardioprotective factor, based on the systems-level integration of gene expression data and prior functional knowledge. Specifically, we set out to identify novel biosignatures of Ado-treatment response in EPCs. Results The predictive integration of gene expression data and standardized functional similarity information enabled us to identify new treatment response biosignatures. Gene expression data originated from Ado-treated and -untreated EPCs samples, and functional similarity was estimated with Gene Ontology (GO)-based similarity information. These information sources enabled us to implement and evaluate an integrated prediction approach based on the concept of <it>k</it>-nearest neighbours learning (<it>k</it>NN). The method can be executed by expert- and data-driven input queries to guide the search for biologically meaningful biosignatures. The resulting <it>integrated kNN </it>system identified new candidate EPC biosignatures that can offer high classification performance (areas under the operating characteristic curve > 0.8). We also showed that the proposed models can outperform those discovered by standard gene expression analysis. Furthermore, we report an initial independent <it>in vitro </it>experimental follow-up, which provides additional evidence of the potential validity of the top biosignature. Conclusion Response to Ado treatment in EPCs can be accurately characterized with a new method based on the combination of gene co-expression data and GO-based similarity information. It also exploits the incorporation of human expert-driven queries as a strategy to guide the automated search for candidate biosignatures. The proposed biosignature improves the systems-level characterization of EPCs. The new integrative predictive modeling approach can also be applied to other phenotype characterization or biomarker discovery problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multiclass classification of microarray data samples with a reduced number of genes

Author: A Alizadeh
A Berger
A Dupuy
A Statnikov
A Statnikov
AI Su
C Ambroise
C Furlanello
CE Shannon
CF Aliferis
DJC Mackay
DK Slonim
E Tapia
EL Allwein
Elizabeth Tapia
F Azuaje
F Masulli
FR Kschischang
G James
G Salton
I Guyon
I Shmulevich
I Tsamardinos
I Witten
J Fan
J Hadar
J Khan
J Zhu
JE Staunton
K Yeung
KH Liu
L Breiman
Laura Angelone
Leonardo Ornella
M Dettling
M Hollander
MA Delgado
N Cristianini
Pilar Bulacio
R Rifkin
R Rifkin
RM Fano
S Dudoit
S Huang
S Lee
S Pomeroy
T Abeel
T Furey
T Li
TG Dietterich
TM Cover
V Guruswami
V Vapnik
X Qiu
Y Lin
Y Saeys
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

CONICET Digital

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis and Computational Dissection of Molecular Signature Multiplicity

Author: A Ploner
Alexander Statnikov
B Hammer
CF Aliferis
CF Aliferis
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
DL Gold
E Dougherty
F Azuaje
F Wagner
G Balazsi
G Natsoulis
I Guyon
I Tsamardinos
J Pearl
J Pearl
J Peña
J Shawe-Taylor
JP Ioannidis
L Ein-Dor
L Ein-Dor
L Li
LR Grate
M Hollander
P Roepman
RL Somorjai
S Michiels
S Ramaswamy
Scott Markel
SM Weiss
T Chu
TR Golub
TS Furey
X Qiu
Publication venue: Public Library of Science
Publication date: 01/05/2010
Field of study

Molecular signatures are computational or mathematical models created to diagnose disease and other phenotypes and to predict clinical outcomes and response to treatment. It is widely recognized that molecular signatures constitute one of the most important translational and basic science developments enabled by recent high-throughput molecular assays. A perplexing phenomenon that characterizes high-throughput data analysis is the ubiquitous multiplicity of molecular signatures. Multiplicity is a special form of data analysis instability in which different analysis methods used on the same data, or different samples from the same population lead to different but apparently maximally predictive signatures. This phenomenon has far-reaching implications for biological discovery and development of next generation patient diagnostics and personalized treatments. Currently the causes and interpretation of signature multiplicity are unknown, and several, often contradictory, conjectures have been made to explain it. We present a formal characterization of signature multiplicity and a new efficient algorithm that offers theoretical guarantees for extracting the set of maximally predictive and non-redundant signatures independent of distribution. The new algorithm identifies exactly the set of optimal signatures in controlled experiments and yields signatures with significantly better predictivity and reproducibility than previous algorithms in human microarray gene expression datasets. Our results shed light on the causes of signature multiplicity, provide computational tools for studying it empirically and introduce a framework for in silico bioequivalence of this important new class of diagnostic and personalized medicine modalities

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Multiplicity: an organizing principle for cancers and somatic mutations

Author: A de la Chapelle
A Statnikov
B Markman
B Vogelstein
C Greenman
C-H Yeang
CJ Sherr
ER Fearon
FE Bleeker
FS Collins
G Tabatabai
H Ledford
JH Ward
JM Hall
JW Arends
K Berns
K Naruse
KW Kinzler
LD Wood
Lewis J Frey
M Kanehisa
M-d-M Inda
Mary E Edgerton
P Topcu-Yilmaz
PA Futreal
R Reiter
R Wooster
S Forbes
S Ortega
SB Edge
SR Piccolo
Stephen R Piccolo
T Kamada
TJ Hudson
WD Nooy
Wellcome Trust Sanger Institute
Y Liang
Y Sun
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background With the advent of whole-genome analysis for profiling tumor tissue, a pressing need has emerged for principled methods of organizing the large amounts of resulting genomic information. We propose the concept of multiplicity measures on cancer and gene networks to organize the information in a clinically meaningful manner. Multiplicity applied in this context extends Fearon and Vogelstein's multi-hit genetic model of colorectal carcinoma across multiple cancers. Methods Using the Catalogue of Somatic Mutations in Cancer (COSMIC), we construct networks of interacting cancers and genes. Multiplicity is calculated by evaluating the number of cancers and genes linked by the measurement of a somatic mutation. The Kamada-Kawai algorithm is used to find a two-dimensional minimum energy solution with multiplicity as an input similarity measure. Cancers and genes are positioned in two dimensions according to this similarity. A third dimension is added to the network by assigning a maximal multiplicity to each cancer or gene. Hierarchical clustering within this three-dimensional network is used to identify similar clusters in somatic mutation patterns across cancer types. Results The clustering of genes in a three-dimensional network reveals a similarity in acquired mutations across different cancer types. Surprisingly, the clusters separate known causal mutations. The multiplicity clustering technique identifies a set of causal genes with an area under the ROC curve of 0.84 versus 0.57 when clustering on gene mutation rate alone. The cluster multiplicity value and number of causal genes are positively correlated via Spearman's Rank Order correlation (<it>rs</it>(8) = 0.894, Spearman's <it>t </it>= 17.48, <it>p </it>< 0.05). A clustering analysis of cancer types segregates different types of cancer. All blood tumors cluster together, and the cluster multiplicity values differ significantly (Kruskal-Wallis, <it>H </it>= 16.98, <it>df </it>= 2, <it>p </it>< 0.05). Conclusion We demonstrate the principle of multiplicity for organizing somatic mutations and cancers in clinically relevant clusters. These clusters of cancers and mutations provide representations that identify segregations of cancer and genes driving cancer progression.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Author: A Kikuchi
A Statnikov
A Ultsch
Andrew Harrison
Aris Perperoglou
Asma Gul
B Lausen
Berthold Lausen
C Cortes
C Ding
C Ma
C Müssel
C Zou
D Apiletti
D Apiletti
DA Notterman
DeAndresSA Díaz‐Uriarte R
DG Altman
E Baralis
GJ Gordon
H Peng
H‐C Liu
J Fan
J Fan
J Lu
K‐H Chen
L Breiman
L Breiman
L Lausser
M Dramiński
M Marczyk
Metodi V Metodiev
N De Jay
Osama Mahmoud
P Alhopuro
P Laiho
RN Jorissen
RS Croner
RS Croner
S Chiaretti
S Michiels
T Cover
T Jirapech‐Umpai
TR Golub
VG Tusher
W Talloen
Y Saeys
Y Su
Zardad Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

University of Essex Research Repository

Crossref

Springer - Publisher Connector

PubMed Central

Explore Bristol Research

Gene selection for classification of microarray data based on the Bayes error

Author: A Ben-Dor
A Statnikov
AA Alizadeh
AL Blum
AR Webb
C Ambroise
C Ding
C Gentile
C Lai
C Lee
CF Aliferis
CH Ooi
D Singh
E Xing
EK Tang
F Goudail
G Carneiro
G Kohavi
GR Xuan
HC Peng
Hong-Wen Deng
I Tssamardinos
J Hua
J Khan
J Weston
Ji-Gang Zhang
JW Lee
K Fukunaga
K Tumer
K Yang
KY Yeung
L Devroye
L Yu
M Chow
M Dash
M Dettling
M Dettling
M Wang
M Xiong
MA Shipp
P Baldi
PA Devijver
R Blanco
R Diaz-Uriarte
R Diaz-Uriarte
R Schalkhoff
RO Duda
S Dudoit
S Mukherjee
S Singh
S Varma
T Golub
T Jirapech-Umpai
T Li
TH Bo
U Alon
X Liu
Y Lee
Y Li
ZY Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy. Results In this study, we propose a new method, Based Bayes error Filter (BBF), to select relevant genes and remove redundant genes in classification analyses of microarray data. The effectiveness and accuracy of this method is demonstrated through analyses of five publicly available microarray datasets. The results show that our gene selection method is capable of achieving better accuracies than previous studies, while being able to effectively select relevant genes, remove redundant genes and obtain efficient and small gene sets for sample classification purposes. Conclusion The proposed method can effectively identify a compact set of genes with high classification accuracy. This study also indicates that application of the Bayes error is a feasible and effective wayfor removing redundant genes in gene selection.</p

University of Missouri: MOspace

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery – Part I: model planning

Author: A Agresti
A Azzalini
A Statnikov
AH Murphy
AJ Petros
AK Jain
B Biagioli
B Bridgewater
BG Tabachnick
Bonizella Biagioli
BW Silverman
CM Bishop
DA Harrison
DC Bamber
DL Reich
DM Shahian
DS Sivia
DW Hosmer
E Artioli
Emanuela Barbini
ER DeLong
EW Steyerberg
F Jaimes
FE Harrell Jr
FH Edwards
G Asimakopoulos
G Cevenini
G Marshall
G Marshall
GA Diamond
Gabriele Cevenini
GD Friedman
GK van Wermeskerken
GT O'Connor
HJ Geisser
HM Krumholz
J Ellenius
J Ivanov
JA Hanley
JH Schafer
JL Moran
JR Le Gall
JV Tu
K Fukunaga
K-Y Liang
MK Campbell
MW Knuiman
NA Obuchowski
O Pitkanen
P Armitage
P Itskowitz
P Schulman
Paolo Barbini
Pierpaolo Giomarelli
PM Lee
R Murphy-Filkins
RI Jennrich
RO Duda
RP Lippmann
RZ Omar
S Arya
S den Boer
S Dreiseitl
S Gangopadhyay
S Le Cessie
S Lemeshow
Sabino Scolletta
SJ Mason
TA Lasko
TA Ryan
TL Higgins
TL Higgins
TL Higgins
VN Vapnik
WA Knaus
WA Knaus
WJ Krzanowski
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Different methods have recently been proposed for predicting morbidity in intensive care units (ICU). The aim of the present study was to critically review a number of approaches for developing models capable of estimating the probability of morbidity in ICU after heart surgery. The study is divided into two parts. In this first part, popular models used to estimate the probability of class membership are grouped into distinct categories according to their underlying mathematical principles. Modelling techniques and intrinsic strengths and weaknesses of each model are analysed and discussed from a theoretical point of view, in consideration of clinical applications. Methods Models based on Bayes rule, <it>k-</it>nearest neighbour algorithm, logistic regression, scoring systems and artificial neural networks are investigated. Key issues for model design are described. The mathematical treatment of some aspects of model structure is also included for readers interested in developing models, though a full understanding of mathematical relationships is not necessary if the reader is only interested in perceiving the practical meaning of model assumptions, weaknesses and strengths from a user point of view. Results Scoring systems are very attractive due to their simplicity of use, although this may undermine their predictive capacity. Logistic regression models are trustworthy tools, although they suffer from the principal limitations of most regression procedures. Bayesian models seem to be a good compromise between complexity and predictive performance, but model recalibration is generally necessary. <it>k</it>-nearest neighbour may be a valid non parametric technique, though computational cost and the need for large data storage are major weaknesses of this approach. Artificial neural networks have intrinsic advantages with respect to common statistical models, though the training process may be problematical. Conclusion Knowledge of model assumptions and the theoretical strengths and weaknesses of different approaches are fundamental for designing models for estimating the probability of morbidity after heart surgery. However, a rational choice also requires evaluation and comparison of actual performances of locally-developed competitive models in the clinical scenario to obtain satisfactory agreement between local needs and model response. In the second part of this study the above predictive models will therefore be tested on real data acquired in a specialized ICU.</p

Crossref

Archivio della Ricerca - Università degli Studi di Siena

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Genome-Wide Gene Function Prediction Resource for Drosophila melanogaster

Author: A Liaw
A Statnikov
A Vazquez
AC Edwards
AC Gavin
AJ Walhout
AM Johansson
B Estrada
C Stark
CJ Echeverri
CL Myers
David E. Hill
DB Johnson
EM Marcotte
Frederick P. Roth
G Obozinski
GE Carney
GW Muse
H Agaisse
H Yu
Han Yan
HJ Lee
HL Liang
HN Chua
I Carrera
I Flockhart
IH Witten
J Beaver
J Jemc
J Reboul
J Wang
J Yu
JC Costello
JF Rual
JG Mezey
JG Sorensen
John E. Beaver
JZ Maines
K Venkatesan
KA Boltz
Kavitha Venkatesan
KC Gunsalus
KD Pruitt
KE Weber
L Breiman
L Giot
LC Firth
M Ashburner
M Johnson
M Kanehisa
M Tasan
M Umemori
Marc Vidal
ME Cusick
Michael E. Cusick
ML Whitfield
MN Arbeitman
Muhammed A. Yildirim
N Robine
N Simonis
NA Terry
Nicholas James Provart
Niels Klitgord
NJ Mulder
Norbert Perrimon
P Braun
P Mourikis
P Muller
P Tomancak
P Uetz
R Sharan
RB Beckstead
RJ Wilson
RL Tatusov
S Aerts
S Li
SB Kotsiantis
T Brody
T Ito
Tong Hao
V Reinke
W Tian
X Deng
X Deng
X Deng
X Qin
X Wang
X Wu
Y Ho
Publication venue: Public Library of Science
Publication date: 01/08/2010
Field of study

Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central