3 research outputs found
Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data
Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe
Linking gene expression and orthology in mammals
The overall aim of biomedical research is to understand disease mechanisms and to provide a drug to eventually cure the disease. This challenging endeavour requires an early research phase that deals with identifying target genes or proteins playing an important role in the disease. At this stage one uses animal models mimicking human disease to determine differences between healthy and diseased animals. Once potential drug targets have been found, compounds are screened and promising compounds go into the preclinical phase where their efficacy and, most importantly, safety are assessed. Those having proven to be efficacious and safe proceed to toxicology where the maximum tolerable dosage is assessed in, mainly, non-rodent species. According to the Bundesministerium für Ernährung und Landwirtschaft, more than 2 million animals were used for animal testing in German laboratories in 2017. The majority of these animals were mice and rats but also dogs, cats and monkeys are model organisms used for testing. While it is commonly accepted that other mammalian species resemble human biology to a great extent, one has to bear in mind that there are species-specific differences. One of the aims of this thesis was to investigate how similar widely used model species are to human and to each other on a molecular level. For this purpose we assessed the relationship between protein sequence identity and gene expression correlation with an emphasis on mouse and rat. We found that the majority of genes are highly similar, both on sequence and gene expression level. There were, however, cases with low sequence identity but high expression correlation. These cases were investigated in greater detail and the hypothesis that sequences annotated in widely used databases like Ensembl, UniProt, or RefSeq, may contain errors or are incomplete, was confirmed. Therefore, we investigated whether sequence information from related species can be used to derive a target’s sequence in a species with poor annotation. The a&o-tool was developed to exploit sequence similarity between related species and short-read RNA-Seq data to refine or validate target sequences. Since longread RNA-Seq data would greatly improve the results as entire transcripts are sequenced as a whole, we conducted a pilot study for comparing short- and long-read sequencing data. Even though PacBio’s SMRT sequencing technology still shows some issues with respect to data quality, it is a very promising approach that is going to prove valuable for sequence refinement. Another important goal of this thesis was to develop a score to assess a human target’s conservation across several model species. Publicly available data on the homology relationships between genes and RNA-Seq data build the basis for this score. Using a set of presumably highly conserved genes in human and mouse, we found that the proposed score yields reasonable results. An enrichment of Gene Ontology terms further strengthened our confidence in the conservation score
A DREAM challenge to build prediction models for short-term discontinuation of docetaxel in metastatic castration-resistant prostate cancer.
Background: Docetaxel has a demonstrated survival benefit for metastatic castration-resistant prostate cancer (mCRPC). However, 10-20% of patients discontinue docetaxel prematurely because of toxicity-induced adverse events, and managing risk factors for toxicity remains an ongoing challenge for health care providers and patients. Prospective identification of high-risk patients for early discontinuation has the potential to assist clinical decision-making and can improve the design of more efficient clinical trials. In partnership with Project Data Sphere (PDS), a non-profit initiative facilitating clinical trial data-sharing, we designed an open-data, crowdsourced DREAM (Dialogue for Reverse Engineering Assessments and Methods) Challenge for developing models to predict early discontinuation of docetaxel. Methods: Data from the comparator arms of four phase III clinical trials in first-line mCRPC were obtained from PDS, including 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 598 patients treated with docetaxel, prednisone/prednisolone, and placebo in the VENICE trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, and 528 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Early discontinuation was defined as treatment stoppage within three months due to adverse treatment effects. Over 150 clinical features including laboratory values, medical history, lesion measures, prior treatment, and demographic variables were curated and made freely available for model building for all four trials. The ASCENT2, VENICE, and MAINSAIL trial data sets formed the training set that also included patient discontinuation status. The ENTHUSE 33 trial, with patient discontinuation status hidden, was used as an independent validation set to evaluate model performance. Prediction performance was assessed using area under the precision-recall curve (AUPRC) and the Bayes factor was used to compare the performance between prediction models. Results: The frequency of early discontinuation was similar between training (ASCENT2, VENICE, and MAINSAIL) and validation (ENTHUSE 33) sets, 12.3% versus 10.4% of docetaxel-treated patients, respectively. In total, 34 independent teams submitted predictions from 61 different models. AUPRC ranged from 0.088 to 0.178 across submissions with a random model performance of 0.104. Seven models with comparable AUPRC scores (Bayes factor ≤ 3) were observed to outperform all other models. A post-challenge analysis of risk predictions generated by these seven models revealed three distinct patient subgroups: patients consistently predicted to be at high-risk or low-risk for early discontinuation and those with discordant risk predictions. Early discontinuation events were two-times higher in the high- versus low-risk subgroup and baseline clinical features such as presence/absence of metastatic liver lesions, and prior treatment with analgesics and ACE inhibitors exhibited statistically significant differences between the high- and low-risk subgroups (adjusted P < 0.05). An ensemble-based model constructed from a post-Challenge community collaboration resulted in the best overall prediction performance (AUPRC = 0.230) and represented a marked improvement over any individual Challenge submission. Findings: Our results demonstrate that routinely collected clinical features can be used to prospectively inform clinicians of mCRPC patients' risk to discontinue docetaxel treatment early due to adverse events and to the best of our knowledge is the first to establish performance benchmarks in this area. This work also underscores the "wisdom of crowds" approach by demonstrating that improved prediction of patient outcomes is obtainable by combining methods across an extended community. These findings were made possible because data from separate trials were made publicly available and centrally compiled through PDS