We propose a constrained maximum partial likelihood estimator for dimension
reduction in integrative (e.g., pan-cancer) survival analysis with
high-dimensional covariates. We assume that for each population in the study,
the hazard function follows a distinct Cox proportional hazards model. To
borrow information across populations, we assume that all of the hazard
functions depend only on a small number of linear combinations of the
predictors. We estimate these linear combinations using an algorithm based on
"distance-to-set" penalties. This allows us to impose both low-rankness and
sparsity. We derive asymptotic results which reveal that our regression
coefficient estimator is more efficient than fitting a separate proportional
hazards model for each population. Numerical experiments suggest that our
method outperforms related competitors under various data generating models. We
use our method to perform a pan-cancer survival analysis relating protein
expression to survival across 18 distinct cancer types. Our approach identifies
six linear combinations, depending on only 20 proteins, which explain survival
across the cancer types. Finally, we validate our fitted model on four external
datasets and show that our estimated coefficients can lead to better prediction
than popular competitors.Comment: Version accepted for publication by Biometric