207 research outputs found
Recommended from our members
A robust classifier of high predictive value to identify good prognosis patients in ER-negative breast cancer.
INTRODUCTION: Patients with primary operable oestrogen receptor (ER) negative (-) breast cancer account for about 30% of all cases and generally have a worse prognosis than ER-positive (+) patients. Nevertheless, a significant proportion of ER- cases have favourable outcomes and could potentially benefit from a less aggressive course of therapy. However, identification of such patients with a good prognosis remains difficult and at present is only possible through examining histopathological factors. METHODS: Building on a previously identified seven-gene prognostic immune response module for ER- breast cancer, we developed a novel statistical tool based on Mixture Discriminant Analysis in order to build a classifier that could accurately identify ER- patients with a good prognosis. RESULTS: We report the construction of a seven-gene expression classifier that accurately predicts, across a training cohort of 183 ER- tumours and six independent test cohorts (a total of 469 ER- tumours), ER- patients of good prognosis (in test sets, average predictive value = 94% [range 85 to 100%], average hazard ratio = 0.15 [range 0.07 to 0.36] p < 0.000001) independently of lymph node status and treatment. CONCLUSIONS: This seven-gene classifier could be used in a polymerase chain reaction-based clinical assay to identify ER- patients with a good prognosis, who may therefore benefit from less aggressive treatment regimens.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform
<p>Abstract</p> <p>Background</p> <p>The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technology that allows the methylation state of over 27,000 CpGs to be assayed. While feature selection and classification methods have been comprehensively explored in the context of gene expression data, relatively little is known as to how best to perform feature selection or classification in the context of Illumina Infinium methylation data. Given the rising importance of epigenomics in cancer and other complex genetic diseases, and in view of the upcoming epigenome wide association studies, it is critical to identify the statistical methods that offer improved inference in this novel context.</p> <p>Results</p> <p>Using a total of 7 large Illumina Infinium 27k Methylation data sets, encompassing over 1,000 samples from a wide range of tissues, we here provide an evaluation of popular feature selection, dimensional reduction and classification methods on DNA methylation data. Specifically, we evaluate the effects of variance filtering, supervised principal components (SPCA) and the choice of DNA methylation quantification measure on downstream statistical inference. We show that for relatively large sample sizes feature selection using test statistics is similar for M and β-values, but that in the limit of small sample sizes, M-values allow more reliable identification of true positives. We also show that the effect of variance filtering on feature selection is study-specific and dependent on the phenotype of interest and tissue type profiled. Specifically, we find that variance filtering improves the detection of true positives in studies with large effect sizes, but that it may lead to worse performance in studies with smaller yet significant effect sizes. In contrast, supervised principal components improves the statistical power, especially in studies with small effect sizes. We also demonstrate that classification using the Elastic Net and Support Vector Machine (SVM) clearly outperforms competing methods like LASSO and SPCA. Finally, in unsupervised modelling of cancer diagnosis, we find that non-negative matrix factorisation (NMF) clearly outperforms principal components analysis.</p> <p>Conclusions</p> <p>Our results highlight the importance of tailoring the feature selection and classification methodology to the sample size and biological context of the DNA methylation study. The Elastic Net emerges as a powerful classification algorithm for large-scale DNA methylation studies, while NMF does well in the unsupervised context. The insights presented here will be useful to any study embarking on large-scale DNA methylation profiling using Illumina Infinium beadarrays.</p
Increased signaling entropy in cancer requires the scale-free property of protein interaction networks
One of the key characteristics of cancer cells is an increased phenotypic
plasticity, driven by underlying genetic and epigenetic perturbations. However,
at a systems-level it is unclear how these perturbations give rise to the
observed increased plasticity. Elucidating such systems-level principles is key
for an improved understanding of cancer. Recently, it has been shown that
signaling entropy, an overall measure of signaling pathway promiscuity, and
computable from integrating a sample's gene expression profile with a protein
interaction network, correlates with phenotypic plasticity and is increased in
cancer compared to normal tissue. Here we develop a computational framework for
studying the effects of network perturbations on signaling entropy. We
demonstrate that the increased signaling entropy of cancer is driven by two
factors: (i) the scale-free (or near scale-free) topology of the interaction
network, and (ii) a subtle positive correlation between differential gene
expression and node connectivity. Indeed, we show that if protein interaction
networks were random graphs, described by Poisson degree distributions, that
cancer would generally not exhibit an increased signaling entropy. In summary,
this work exposes a deep connection between cancer, signaling entropy and
interaction network topology.Comment: 20 pages, 5 figures. In Press in Sci Rep 201
An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer
A feature selection method was used in an analysis of three major microarray expression datasets to identify molecular subclasses and prognostic markers in estrogen receptor-negative breast cancer, showing that it is a heterogeneous disease with at least four main subtypes
Signalling entropy: A novel network-theoretical framework for systems analysis and interpretation of functional omic data
a b s t r a c t A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology
Prognostic gene network modules in breast cancer hold promise
A substantial proportion of lymph node-negative patients who receive adjuvant chemotherapy do not derive any benefit from this aggressive and potentially toxic treatment. However, standard histopathological indices cannot reliably detect patients at low risk of relapse or distant metastasis. In the past few years several prognostic gene expression signatures have been developed and shown to potentially outperform histopathological factors in identifying low-risk patients in specific breast cancer subgroups with predictive values of around 90%, and therefore hold promise for clinical application. We envisage that further improvements and insights may come from integrative expression pathway analyses that dissect prognostic signatures into modules related to cancer hallmarks
Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis
The quantity of mRNA transcripts in a cell is determined by a complex interplay of cooperative and counteracting biological processes. Independent Component Analysis (ICA) is one of a few number of unsupervised algorithms that have been applied to microarray gene expression data in an attempt to understand phenotype differences in terms of changes in the activation/inhibition patterns of biological pathways. While the ICA model has been shown to outperform other linear representations of the data such as Principal Components Analysis (PCA), a validation using explicit pathway and regulatory element information has not yet been performed. We apply a range of popular ICA algorithms to six of the largest microarray cancer datasets and use pathway-knowledge and regulatory-element databases for validation. We show that ICA outperforms PCA and clustering-based methods in that ICA components map closer to known cancer-related pathways, regulatory modules, and cancer phenotypes. Furthermore, we identify cancer signalling and oncogenic pathways and regulatory modules that play a prominent role in breast cancer and relate the differential activation patterns of these to breast cancer phenotypes. Importantly, we find novel associations linking immune response and epithelial–mesenchymal transition pathways with estrogen receptor status and histological grade, respectively. In addition, we find associations linking the activity levels of biological pathways and transcription factors (NF1 and NFAT) with clinical outcome in breast cancer. ICA provides a framework for a more biologically relevant interpretation of genomewide transcriptomic data. Adopting ICA as the analysis tool of choice will help understand the phenotype–pathway relationship and thus help elucidate the molecular taxonomy of heterogeneous cancers and of other complex genetic diseases
- …