Successful prediction of the likely paths of tumor progression is valuable for diagnostic,
prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and
thus CPMs encode the paths of tumor progression. Here we analyze the performance of
four CPMs to examine whether they can be used to predict the true distribution of paths of
tumor progression and to estimate evolutionary unpredictability. Employing simulations we
show that if fitness landscapes are single peaked (have a single fitness maximum) there is
good agreement between true and predicted distributions of paths of tumor progression
when sample sizes are large, but performance is poor with the currently common much
smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all
cases, detection regime (when tumors are sampled) is a key determinant of performance.
Estimates of evolutionary unpredictability from the best performing CPM, among the four
examined, tend to overestimate the true unpredictability and the bias is affected by detection
regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability
for several of the data sets. But most of the predictions of paths of tumor progression are
very unreliable, and unreliability increases with the number of features analyzed. Our results
indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and
emphasize the need for methodological work that can account for the probably multi-peaked
fitness landscapes in cancerWork partially supported by BFU2015-
67302-R (MINECO/FEDER, EU) to RDU. CV
supported by PEJD-2016-BMD-2116 from
Comunidad de Madrid to RD