Skip to main content
Article thumbnail
Location of Repository

A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer

By Christine Staiger, Sidney Cadot, Raul Kooter, Marcus Dittrich, Tobias Müller, Gunnar W. Klau and Lodewyk F. A. Wessels


Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcome in breast cancer

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (2008). A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer.
  2. (2002). A gene-expressionsignatureasapredictorofsurvivalinbreastcancer.NEnglJMed 347:
  3. (2010). A pathwaybased classi_cation of human breast cancer.
  4. (2005). A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics
  5. (2005). An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.
  6. (2007). Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.
  7. (2010). Degas: de novo discovery of dysregulated pathways in human diseases.
  8. (2005). Discovering statistically significant pathways in expression profiling studies.
  9. (2009). Dynamic modularity in protein interaction networks predicts breast cancer outcome.
  10. (2010). Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
  11. (2002). Gene expression profiling predicts clinical outcome of breast cancer.
  12. (2005). Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts.
  13. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
  14. (2005). Geneexpression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.
  15. (2006). Genomic and transcriptional aberrations linked to breast cancer pathophysiologies.
  16. (2011). Hallmarks of cancer: the next generation.
  17. (2006). Human protein reference database–2006 update.
  18. (2009). Human protein reference database–2009 update.
  19. (2010). Incorporating gene co-expression network in identi_cation of cancer prognosis markers.
  20. (2010). Inferring cancer subnetwork markers using density-constrained biclustering. Bioinformatics
  21. (2008). Inferring pathway activity toward precise disease classification.
  22. (2010). Kegg for representation and analysis of molecular networks involving diseases and drugs.
  23. (2008). Mu ¨ller T
  24. (2007). Network-based classification of breast cancer metastasis.
  25. (2005). Online predicted human interaction database.
  26. (2005). Outcome signature genes in breast cancer: is there a unique set?
  27. (2008). Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability.
  28. (2010). Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context.
  29. (2007). Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the transbig multicenter independent validation series.
  30. (2011). Subnetwork state functions de_ne dysregulated subnetworks in cancer.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.