576 research outputs found

    Supervised Dimension Reduction for Large-scale Omics Data with Censored Survival Outcomes Under Possible Non-proportional Hazards

    Get PDF
    The past two decades have witnessed significant advances in high-throughput ``omics technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive ``omics profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential time-varying effects of features, and methods for handling them have been lacking. In this paper, we propose supervised methods for feature selection and survival prediction that simultaneously deal with both issues. Our approach utilizes continuum power regression (CPR) - a framework that includes a variety of regression methods - in conjunction with the parametric or semi-parametric accelerated failure time (AFT) model. Both CPR and AFT fall within the linear models framework and, unlike black-box models, the proposed prognostic index has a simple yet useful interpretation. We demonstrate the utility of our methods using simulated and publicly available cancer genomics data

    Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.</p> <p>Results</p> <p>We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).</p> <p>Conclusion</p> <p>The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.</p

    Supervised wavelet method to predict patient survival from gene expression data.

    Get PDF
    In microarray studies, the number of samples is relatively small compared to the number of genes per sample. An important aspect of microarray studies is the prediction of patient survival based on their gene expression profile. This naturally calls for the use of a dimension reduction procedure together with the survival prediction model. In this study, a new method based on combining wavelet approximation coefficients and Cox regression was presented. The proposed method was compared with supervised principal component and supervised partial least squares methods. The different fitted Cox models based on supervised wavelet approximation coefficients, the top number of supervised principal components, and partial least squares components were applied to the data. The results showed that the prediction performance of the Cox model based on supervised wavelet feature extraction was superior to the supervised principal components and partial least squares components. The results suggested the possibility of developing new tools based on wavelets for the dimensionally reduction of microarray data sets in the context of survival analysis

    A method for analyzing censored survival phenotype with gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Survival time is an important clinical trait for many disease studies. Previous works have shown certain relationship between patients' gene expression profiles and survival time. However, due to the censoring effects of survival time and the high dimensionality of gene expression data, effective and unbiased selection of a gene expression signature to predict survival probabilities requires further study.</p> <p>Method</p> <p>We propose a method for an integrated study of survival time and gene expression. This method can be summarized as a two-step procedure: in the first step, a moderate number of genes are pre-selected using correlation or liquid association (LA). Imputation and transformation methods are employed for the correlation/LA calculation. In the second step, the dimension of the predictors is further reduced using the modified sliced inverse regression for censored data (censorSIR).</p> <p>Results</p> <p>The new method is tested via both simulated and real data. For the real data application, we employed a set of 295 breast cancer patients and found a linear combination of 22 gene expression profiles that are significantly correlated with patients' survival rate.</p> <p>Conclusion</p> <p>By an appropriate combination of feature selection and dimension reduction, we find a method of identifying gene expression signatures which is effective for survival prediction.</p

    Censored Data Regression in High-Dimension and Low-Sample Size Settings For Genomic Applications

    Get PDF
    New high-throughput technologies are generating various types of high-dimensional genomic and proteomic data and meta-data (e.g., networks and pathways) in order to obtain a systems-level understanding of various complex diseases such as human cancers and cardiovascular diseases. As the amount and complexity of the data increase and as the questions being addressed become more sophisticated, we face the great challenge of how to model such data in order to draw valid statistical and biological conclusions. One important problem in genomic research is to relate these high-throughput genomic data to various clinical outcomes, including possibly censored survival outcomes such as age at disease onset or time to cancer recurrence. We review some recently developed methods for censored data regression in the high-dimension and low-sample size setting, with emphasis on applications to genomic data. These methods include dimension reduction-based methods, regularized estimation methods such as Lasso and threshold gradient descent method, gradient descent boosting methods and nonparametric pathways-based regression models. These methods are demonstrated and compared by analysis of a data set of microarray gene expression profiles of 240 patients with diffuse large B-cell lymphoma together with follow-up survival information. Areas of further research are also presented

    Survival prediction from clinico-genomic models - a comparative study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Survival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models.</p> <p>Results</p> <p>We propose a way to combine classical clinical covariates with genomic data in a clinico-genomic prediction model based on the Cox regression model. The prediction model is obtained by a simultaneous use of both types of covariates, but applying dimension reduction only to the high-dimensional genomic variables. We describe how this can be done for seven well-known prediction methods: variable selection, unsupervised and supervised principal components regression and partial least squares regression, ridge regression, and the lasso. We further perform a systematic comparison of the performance of prediction models using clinical covariates only, genomic data only, or a combination of the two. The comparison is done using three survival data sets containing both clinical information and microarray gene expression data. Matlab code for the clinico-genomic prediction methods is available at <url>http://www.med.uio.no/imb/stat/bmms/software/clinico-genomic/</url>.</p> <p>Conclusions</p> <p>Based on our three data sets, the comparison shows that established clinical covariates will often lead to better predictions than what can be obtained from genomic data alone. In the cases where the genomic models are better than the clinical, ridge regression is used for dimension reduction. We also find that the clinico-genomic models tend to outperform the models based on only genomic data. Further, clinico-genomic models and the use of ridge regression gives for all three data sets better predictions than models based on the clinical covariates alone.</p
    • …
    corecore