Search CORE

14,968 research outputs found

Identification of a gene signature in cell cycle pathway for breast cancer prognosis using gene expression profiling data

Author: A Keith Dunker
A Potti
A Subramanian
A Urruticoechea
AH Bild
Andrew Campen
B Baldetorp
B Lloveras
C Fan
CM Perou
DW Hedley
E Baldini
FM Waldman
GS Eichler
H Kuhling
HY Chang
HY Chuang
J Massague
J Quackenbush
JD Mosley
Jiangang Liu
JK Lee
JM Bueno-de-Mesquita
JS Meyer
K Keyomarsi
KL Evans
KR Coombes
L Ein-Dor
LD Miller
LJ van't Veer
M Buyse
M Colozza
Mathew Palakal
MG Peters
MJ van de Vijver
PJ van Diest
R Clarke
R Tibshirani
S Han
S Paik
Sheng-Bin Peng
Shuguang Huang
Shuyu Li
T Sorlie
T Sorlie
T Suzuki
V Vuaroqueaux
Xiang Ye
XJ Ma
Y Hu
Y Pawitan
Y Wang
Yuni Xia
Publication venue: BioMed Central
Publication date: 01/09/2008
Field of study

Abstract Background Numerous studies have used microarrays to identify gene signatures for predicting cancer patient clinical outcome and responses to chemotherapy. However, the potential impact of gene expression profiling in cancer diagnosis, prognosis and development of personalized treatment may not be fully exploited due to the lack of consensus gene signatures and poor understanding of the underlying molecular mechanisms. Methods We developed a novel approach to derive gene signatures for breast cancer prognosis in the context of known biological pathways. Using unsupervised methods, cancer patients were separated into distinct groups based on gene expression patterns in one of the following pathways: apoptosis, cell cycle, angiogenesis, metastasis, p53, DNA repair, and several receptor-mediated signaling pathways including chemokines, EGF, FGF, HIF, MAP kinase, JAK and NF-κB. The survival probabilities were then compared between the patient groups to determine if differential gene expression in a specific pathway is correlated with differential survival. Results Our results revealed expression of cell cycle genes is strongly predictive of breast cancer outcomes. We further confirmed this observation by building a cell cycle gene signature model using supervised methods. Validated in multiple independent datasets, the cell cycle gene signature is a more accurate predictor for breast cancer clinical outcome than the previously identified Amsterdam 70-gene signature that has been developed into a FDA approved clinical test MammaPrint®. Conclusion Taken together, the gene expression signature model we developed from well defined pathways is not only a consistently powerful prognosticator but also mechanistically linked to cancer biology. Our approach provides an alternative to the current methodology of identifying gene expression markers for cancer prognosis and drug responses using the whole genome gene expression data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Gene Expression based Survival Prediction for Cancer Patients: A Topic Modeling Approach

Author: Greiner Russell
Kumar Luke
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Cancer is one of the leading cause of death, worldwide. Many believe that genomic data will enable us to better predict the survival time of these patients, which will lead to better, more personalized treatment options and patient care. As standard survival prediction models have a hard time coping with the high-dimensionality of such gene expression (GE) data, many projects use some dimensionality reduction techniques to overcome this hurdle. We introduce a novel methodology, inspired by topic modeling from the natural language domain, to derive expressive features from the high-dimensional GE data. There, a document is represented as a mixture over a relatively small number of topics, where each topic corresponds to a distribution over the words; here, to accommodate the heterogeneity of a patient's cancer, we represent each patient (~document) as a mixture over cancer-topics, where each cancer-topic is a mixture over GE values (~words). This required some extensions to the standard LDA model eg: to accommodate the "real-valued" expression values - leading to our novel "discretized" Latent Dirichlet Allocation (dLDA) procedure. We initially focus on the METABRIC dataset, which describes breast cancer patients using the r=49,576 GE values, from microarrays. Our results show that our approach provides survival estimates that are more accurate than standard models, in terms of the standard Concordance measure. We then validate this approach by running it on the Pan-kidney (KIPAN) dataset, over r=15,529 GE values - here using the mRNAseq modality - and find that it again achieves excellent results. In both cases, we also show that the resulting model is calibrated, using the recent "D-calibrated" measure. These successes, in two different cancer types and expression modalities, demonstrates the generality, and the effectiveness, of this approach

arXiv.org e-Print Archive

Directory of Open Access Journals

GliomaPredict: A Clinically Useful Tool for Assigning Glioma Patients to Specific Molecular Subtypes

Author: Li Aiguo
Bozdag Serdar
Kotliarov Yuri
Fine Howard A
Publication venue: e-Publications@Marquette
Publication date: 01/01/2010
Field of study

Background: Advances in generating genome-wide gene expression data have accelerated the development of molecular-based tumor classification systems. Tools that allow the translation of such molecular classification schemas from research into clinical applications are still missing in the emerging era of personalized medicine. Results: We developed GliomaPredict as a computational tool that allows the fast and reliable classification of glioma patients into one of six previously published stratified subtypes based on sets of extensively validated classifiers derived from hundreds of glioma transcriptomic profiles. Our tool utilizes a principle component analysis (PCA)-based approach to generate a visual representation of the analyses, quantifies the confidence of the underlying subtype assessment and presents results as a printable PDF file. GliomaPredict tool is implemented as a plugin application for the widely-used GenePattern framework. Conclusions: GliomaPredict provides a user-friendly, clinically applicable novel platform for instantly assigning gene expression-based subtype in patients with gliomas thereby aiding in clinical trial design and therapeutic decisionmaking. Implemented as a user-friendly diagnostic tool, we expect that in time GliomaPredict, and tools like it, will become routinely used in translational/clinical research and in the clinical care of patients with gliomas

epublications@Marquette

Saint Louis University Libraries Digital Collections

"Pre-conditioning" for feature selection and regression in high-dimensional problems

Author: Bair Eric
Hastie Trevor
Paul Debashis
Tibshirani Robert
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/03/2007
Field of study

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the pre-conditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the pre-conditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data

arXiv.org e-Print Archive

CiteSeerX

Crossref

Prediction with Dimension Reduction of Multiple Molecular Data Sources for Patient Survival

Author: Kaplan Adam
Lock Eric F.
Publication venue: 'SAGE Publications'
Publication date: 01/07/2017
Field of study

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal components analysis (PCA). However, the application of PCA is not straightforward for multi-source data, wherein multiple sources of 'omics data measure different but related biological components. In this article we utilize recent advances in the dimension reduction of multi-source data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multi-source data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example we consider predicting survival for Glioblastoma Multiforme (GBM) patients from three data sources measuring mRNA expression, miRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction, and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function 'jive.predict'.Comment: 11 pages, 9 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Genetic programming for mining DNA chip data from cancer patients

Author: Buxton BF
Langdon WB
Publication venue
Publication date: 01/01/2004
Field of study

In machine learning terms DNA (gene) chip data is unusual in having thousands of attributes (the gene expression values) but few (<100) records (the patients). A GP based method for both feature selection and generating simple models based on a few genes is demonstrated on cancer data

CiteSeerX

UCL Discovery

Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data

Author: Boulesteix Anne-Laure
Strimmer Korbinian
Publication venue
Publication date: 01/01/2005
Field of study

Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks

Open Access LMU