19 research outputs found

    Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

    Get PDF
    Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe

    Assessment of data transformations for model-based clustering of RNA-Seq data

    No full text
    <div><p>Quality control, global biases, normalization, and analysis methods for RNA-Seq data are quite different than those for microarray-based studies. The assumption of normality is reasonable for microarray based gene expression data; however, RNA-Seq data tend to follow an over-dispersed Poisson or negative binomial distribution. Little research has been done to assess how data transformations impact Gaussian model-based clustering with respect to clustering performance and accuracy in estimating the correct number of clusters in RNA-Seq data. In this article, we investigate Gaussian model-based clustering performance and accuracy in estimating the correct number of clusters by applying four data transformations (i.e., naïve, logarithmic, Blom, and variance stabilizing transformation) to simulated RNA-Seq data. To do so, an extensive simulation study was carried out in which the scenarios varied in terms of: how genes were selected to be included in the clustering analyses, size of the clusters, and number of clusters. Following the application of the different transformations to the simulated data, Gaussian model-based clustering was carried out. To assess clustering performance for each of the data transformations, the adjusted rand index, clustering error rate, and concordance index were utilized. As expected, our results showed that clustering performance was gained in scenarios where data transformations were applied to make the data appear “more” Gaussian in distribution.</p></div

    Comparison of raw serous expression counts verses simulated data.

    No full text
    <p>Log-transformed mean and log-transformed variances are plotted for comparison of raw serous expression count data and simulated data scenarios for a single dataset. Data points depicted in red are representative of data simulated using Negative Binomial (NB) parameters from the top 100 genes; and similarly in blue, simulated data from 100 randomly selected genes based upon Median Absolute Deviation (MAD) of expression values taken from ovarian cancer RNA-Seq samples (N = 55 tumors).</p

    An ensemble-based Cox proportional hazards regression framework for predicting survival in metastatic castration-resistant prostate cancer (mCRPC) patients [version 1; referees: 1 approved, 2 approved with reservations]

    No full text
    From March through August 2015, nearly 60 teams from around the world participated in the Prostate Cancer Dream Challenge (PCDC). Participating teams were faced with the task of developing prediction models for patient survival and treatment discontinuation using baseline clinical variables collected on metastatic castrate-resistant prostate cancer (mCRPC) patients in the comparator arm of four phase III clinical trials. In total, over 2,000 mCRPC patients treated with first-line docetaxel comprised the training and testing data sets used in this challenge. In this paper we describe: (a) the sub-challenges comprising the PCDC, (b) the statistical metrics used to benchmark prediction performance, (c) our analytical approach, and finally (d) our team’s overall performance in this challenge. Specifically, we discuss our curated, ad-hoc, feature selection (CAFS) strategy for identifying clinically important risk-predictors, the ensemble-based Cox proportional hazards regression framework used in our final submission, and the adaptation of our modeling framework based on the results from the intermittent leaderboard rounds. Strong predictors of patient survival were successfully identified utilizing our model building approach. Several of the identified predictors were new features created by our team via strategically merging collections of weak predictors. In each of the three intermittent leaderboard rounds, our prediction models scored among the top four models across all participating teams and our final submission ranked 9th place overall with an integrated area under the curve (iAUC) of 0.7711 computed in an independent test set. While the prediction performance of teams placing between 2nd- 10th (iAUC: 0.7710-0.7789) was better than the current gold-standard prediction model for prostate cancer survival, the top-performing team, FIMM-UTU significantly outperformed all other contestants with an iAUC of 0.7915.  In summary, our ensemble-based Cox regression framework with CAFS resulted in strong overall performance for predicting prostate cancer survival and represents a promising approach for future prediction problems

    Comparison of model-based clustering evaluation criteria.

    No full text
    <p>Mean Adjusted Rand Index (ARI), Clustering Error Rate (CER), and Concordance Index (CI) are plotted for each of the four parent dataset categories for K = 2 clusters (in coral) and K = 3 clusters (in teal).</p
    corecore