BACKGROUND: Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers. RESULTS: We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered. CONCLUSION: These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies

A Aouacheria

A Cromer

AG Bader

B Vogelstein

BJ Quade

BR Zeeberg

C Cortes

CF Aliferis

CL Nutt

CM Perou

DR Rhodes

ET Munoz

Fabien Campagne

G Dennis Jr.

GP Donovan

GS Sellick

HK Lee

IB Rosenwald

JC Darnell

KF Manly

L Dyrskjot

L Skrabanek

LJ van 't Veer

Lucy Skrabanek

M Unoki

MJ Clemens

MS Boguski

R Aebersold

R Edgar

R Simon

RB Darnell

S Mukherjee

S Ramaswamy

SL Pomeroy

T Joachims

TJ MacDonald

TM Chu

VE Velculescu

W Liu

YT Chen

Z Zhang

English

PubMed

Crossref

Mining expressed sequence tags identifies cancer markers of clinical interest

Abstract Background Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers. Results We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered. Conclusion These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies.</p

Skrabanek Lucy

Campagne Fabien

Directory of Open Access Journals

BMC Bioinformatics

Springer - Publisher Connector

AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res

Campagne F: TissueInfo web resources.

Campagne F: TissueInfo: high-throughput identification of tissue expression profiles and specificity. Nucleic Acids Res

Cancer Genome Anatomy Project.

CC: Molecular pathogenesis of uterine smooth muscle tumors from transcriptional profiling. Genes Chromosomes Cancer

Chinnaiyan AM: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.

DA: Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat Genet

Database for Annotation, Visualization, and Integrated Discovery. Genome Biol

DN: Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification. Cancer Res

Ensmart web site [http://www.biomart.org/]

Golub TR: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature

High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiplemicroarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics

Houlston RS: Further evidence that germline CEBPA mutations cause dominant inheritance of acute myeloid leukaemia. Leukemia

Ingenuity web site [http://www.ingenuity.com]

Kinzler KW: Cancer genes and the pathways they control. Nat Med

Kinzler KW: Serial analysis of gene expression. Science

Learning To Classify Text Using Support Vector Machines.

LM: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.

MC: Circulating antibody to prostate antigen in patients with prostatic cancer.

Mesirov JP: Estimating dataset size requirements for classifying DNA microarray data.

MW: Microarray and EST database estimates of mRNA expression levels differ: the protein length versus expression curve for C. elegans. BMC Genomics

Onconeural antigens and the paraneoplastic neurologic disorders: at the intersection of cancer, immunity, and the brain.

Orntoft TF: Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet

Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res

Perspective: a program to improve protein biomarker discovery for cancer. J Proteome Res

PK: An essential role for protein synthesis in oncogenic cellular transformation. Oncogene

prior probability, and statistical tests of multiple hypotheses. Genome Res

Prostate-specific antigen and early detection of prostate cancer. Tumour Biol

RB: Fragile X mental retardation protein targets G quartet mRNAs important for neuronal function. Cell

Sensitivity to jerky gene dosage underlies epileptic seizures in mice.

SH: Gene expression profiling predicts clinical outcome of breast cancer.

Simpson AJ: Identification of cancer/testis-antigen genes by massively parallel signature sequencing.

Skrabanek L: Supplementary Material on the TissueInfo web site.

Targets and mechanisms for the regulation of translation in malignant transformation. Oncogene

The RNA binding domain of Jerky consists of tandemly arranged helix-turn-helix/homeodomain-like motifs and binds specific sets of mRNAs. Mol Cell Biol

The role of translation in neoplastic transformation from a pathologist's point of view. Oncogene

Tmm database at

Tolstoshev CM: dbEST--database for &quot;expressed sequence tags&quot;.

TR: A molecular signature of metastasis in primary solid tumors. Nat Genet

Tsamardinos I: Challenges in the Analysis of Mass-Throughput Data: A Techinical Commentary from the Statistical Machine Learning Perspective. Cancer Informatics

Vapnik V: Support-Vector Networks.

W: A greedy algorithm for aligning DNA sequences.

Wasylyk B: Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis. Oncogene

Y: EGR2 induces apoptosis in various cancer cell lines by direct transactivation of BNIP3L and BAK. Oncogene

Zhang M: In silico whole-genome scanning of cancer-associated nonsynonymous SNPs and molecular characterization of a dynein light chain tumour variant. Oncogene

file:///data/remote/core/dit/data/Springer-OA/pdf/3e3/aHR0cDovL2xpbmsuc3ByaW5nZXIuY29tLzEwLjExODYvMTQ3MS0yMTA1LTctNDgxLnBkZg==.pdf

Mining expressed sequence tags identifies cancer markers of clinical interest

Abstract

Similar works

Full text

Available Versions

Crossref

Directory of Open Access Journals

Springer - Publisher Connector

Springer - Publisher Connector