Prediction Models for Cancer Risk and Prognosis using Clinical and DNA Methylation Biomarkers: Considerations in Study Design and Model Development

Abstract

The ability to accurately predict the prognosis for any given disease is of immense value for clinicians and patients. It can dictate and optimize an individual treatment plan for a patient and ultimately improve their quality of life and reduce the financial burden associated with unnecessary treatment. To allow the accurate prediction of disease prognosis, ongoing development of prediction models is of crucial importance. We introduce a novel curated, ad-hoc, feature selection (CAFS) strategy in the context of the Prostate Cancer DREAM Challenge. We demonstrate enhanced prediction performance of overall survival differences in patients with metastatic castration-resistant prostate cancer by applying CAFS and identify clinically important risk-predictors. With ongoing advancements in the omics field promising molecular biomarkers are being identified in order to facilitate disease prognosis beyond the capability of clinical information. The identification of such biomarkers depends on the examination of omic marks in adequately powered studies. With the goal to assist researchers in study design and planning of epigenome wide association studies of DNA methylation, we present a user-friendly tool, pwrEWAS, for comprehensive power estimation for epigenome-wide association studies. The R package for pwrEWAS is publicly available at GitHub (https://github.com/stefangraw/pwrEWAS) and the web interface is available at https://biostats-shinyr.kumc.edu/pwrEWAS/. The enormous volume of omic marks requires stringent evaluation to discover combinations of complementary marks that assemble predictive biomarkers. We therefore present a heuristic feature selection approach that allows one to handle such high-dimensional data. Selection Probability Optimization for Feature Selection (SPOFS) is designed to identify an optimal subset of omic features from among a vast pool of such features, which collectively improves prediction accuracy and form a biomarker. The integration of such biomarkers can then be utilized in the development and improvement of prediction models

    Similar works