Search CORE

arXiv.org e-Print Archive

Stable Feature Selection for Biomarker Discovery

Author: He Zengyou
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Reproducibility and Concordance of Differential DNA Methylation and Gene Expression in Cancer

Author: Guo Zheng
He Lang
He Zheng
Li Hongdong
Shen Xiaopei
Yao Chen
Publication venue: Public Library of Science
Publication date: 03/01/2012
Field of study

Background: Hundreds of genes with differential DNA methylation of promoters have been identified for various cancers. However, the reproducibility of differential DNA methylation discoveries for cancer and the relationship between DNA methylation and aberrant gene expression have not been systematically analysed. Methodology/Principal Findings: Using array data for seven types of cancers, we first evaluated the effects of experimental batches on differential DNA methylation detection. Second, we compared the directions of DNA methylation changes detected from different datasets for the same cancer. Third, we evaluated the concordance between methylation and gene expression changes. Finally, we compared DNA methylation changes in different cancers. For a given cancer, the directions of methylation and expression changes detected from different datasets, excluding potential batch effects, were highly consistent. In different cancers, DNA hypermethylation was highly inversely correlated with the down-regulation of gene expression, whereas hypomethylation was only weakly correlated with the up-regulation of genes. Finally, we found that genes commonly hypomethylated in different cancers primarily performed functions associated with chronic inflammation, such as ‘keratinization’, ‘chemotaxis ’ and ‘immune response’. Conclusions: Batch effects could greatly affect the discovery of DNA methylation biomarkers. For a particular cancer, both differential DNA methylation and gene expression can be reproducibly detected from different studies with no batc

CiteSeerX

Public Library of Science (PLOS)

arXiv.org e-Print Archive

FigShare

Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

Author: Alvord W. G.
Auer P. L.
Chen Y.
Chen Z.
Cohen J.
Fechner G. T.
Guyon I.
Göhlmann H.
Lee J.
Li C.
Schwender H.
Smyth G. K.
Snedecor G. W.
Trevino V.
Vandesompele J.
Welsh B. L.
WENTIAN LI
Zhao C.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2013
Field of study

Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

Springer - Publisher Connector

Multi-level reproducibility of signature hubs in human interactome for breast cancer metastasis

Author: Guo Zheng
Li Hongdong
Yao Chen
Zhang Lin
Zhou Chenggui
Zou Jinfeng
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background It has been suggested that, in the human protein-protein interaction network, changes of co-expression between highly connected proteins ("hub") and their interaction neighbours might have important roles in cancer metastasis and be predictive disease signatures for patient outcome. However, for a cancer, such disease signatures identified from different studies have little overlap. Results Here, we propose a systemic approach to evaluate the reproducibility of disease signatures at multiple levels, on the basis of some statistically testable biological models. Using two datasets for breast cancer metastasis, we showed that different signature hubs identified from different studies were highly consistent in terms of significantly sharing interaction neighbours and displaying consistent co-expression changes with their overlapping neighbours, whereas the shared interaction neighbours were significantly over-represented with known cancer genes and enriched in pathways deregulated in breast cancer pathogenesis. Then, we showed that the signature hubs identified from the two datasets were highly reproducible at the protein interaction and pathway levels in three other independent datasets. Conclusions Our results provide a possible biological model that different signature hubs altered in different patient cohorts could disturb the same pathways associated with cancer metastasis through their interaction neighbours.</p

Public Library of Science (PLOS)

Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms

Author: A Carvajal-Rodriguez
A Cruz-Marcelo
AC Sauve
AK Callesen
AW Bell
B Huang
BL Adam
C Li
C Mathelin
C Truntzer
Chen Yao
DF Ransohoff
DM Rissin
DM Rocke
DW Swinkels
EP Diamandis
FJ Esteva
G Kristina
Guini Hong
HJ Song
II Emanuele VA
J Frobel
J Li
J MacQueen
J Wang
JA Mead
JF Timms
Jinfeng Zou
Jing Wang
JM Hogan
JW Wong
KA Baggerly
KR Coombes
L Diao
L Ein-Dor
L Ein-Dor
L Klebanov
L Pusztai
L Shi
L Sun
Lin Zhang
M De Bock
M Dijkstra
M Zhang
M Zhang
MA Kuzyk
ME Sanders
MK Tuck
ML Lee
P Du
PC Carvalho
PJ Rousseeuw
R Aebersold
RE Caffrey
SM Hanash
T Fortin
TC Poon
W Meuleman
WC Cho
WC Cho
William C.S. Cho
X Gong
X Li
X Qiu
Xinwu Guo
Y Benjamini
Y Pawitan
Y Yasui
Zheng Guo
Publication venue: Public Library of Science
Publication date: 14/10/2011
Field of study

BACKGROUND: There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. RESULTS: In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased. CONCLUSIONS: Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers

Aberdeen University Research

Integrative analysis of the colorectal cancer proteome : potential clinical impact

Author: Alnabulsi Abdo
Murray Graeme I.
Publication venue: 'Informa UK Limited'
Publication date: 07/09/2016
Field of study

Peer reviewedPostprin

FigShare

Increasing consistency of disease biomarker prediction across datasets

Author: A Achiron
A Kuhn
AE Teschendorff
AR Abbas
B Zheng
C Riveros
Christos Hatzis
D Arasappan
DD Kang
E Kotelnikova
F Gilli
F Zhang
GC Tseng
H Choi
HJ Eysenck
HJ Eysenck
I Borozan
I Kupershmidt
J ichi Satoh
JA Gagnon-Bartsch
JK Choi
JR Stevens
JT Leek
JT Leek
KS Gandhi
L Ein-Dor
M Gurevich
M Hecker
M Hecker
M Kapushesky
M Zhang
Maria D. Chikina
MM Goldenberg
R Bomprezzi
R Breitling
RA Irizarry
RC Axtell
S Chakraborty
S Lu
SE Bushnell
Stuart C. Sealfon
T Manoli
V Annibali
WI McDonald
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/04/2014
Field of study

Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern. © 2014 Chikina, Sealfon

Public Library of Science (PLOS)