Prognostic models have clinical appeal to aid therapeutic decision making. In the
UK, the Nottingham Prognostic Index (NPI) has been used, for over two decades, to
inform patient management. However, it has been commented that NPI is not
capable of identifying a subgroup of patients with a prognosis so good that adjuvant
therapy with potential harmful side effects can be withheld safely.
Tissue Microarray Analysis (TMA) now makes possible measurement of biological
tissue microarray features of frozen biopsies from breast cancer tumours. These give
an insight to the biology of tumour and hence could have the potential to enhance
prognostic modelling. I therefore wished to investigate whether biomarkers can add
value to clinical predictors to provide improved prognostic stratification in terms of
Recurrence Free Survival (RFS).
However, there are very many biomarkers that could be measured, they usually
exhibit skewed distribution and missing values are common. The statistical issues
raised are thus number of variables being tested, form of the association, imputation
of missing data, and assessment of the stability and internal validity of the model.
Therefore the specific aim of this study was to develop and to demonstrate
performance of statistical modelling techniques that will be useful in circumstances
where there is a surfeit of explanatory variables and missing data; in particular to
achieve useful and parsimonious models while guarding against instability and
overfitting. I also sought to identify a subgroup of patients with a prognosis so good that a decision can be made to avoid adjuvant therapy. I aimed to provide statistically
robust answers to a set of clinical question and develop strategies to be used in such
data sets that would be useful and acceptable to clinicians.
A unique data set of 401 Estrogen Receptor positive (ER+) tamoxifen treated breast
cancer patients with measurement for a large panel of biomarkers (72 in total) was
available. Taking a statistical approach, I applied a multi-faceted screening process to
select a limited set of potentially informative variables and to detect the appropriate
form of the association, followed by multiple imputations of missing data and
bootstrapping. In comparison with the NPI, the final joint model derived assigned
patients into more appropriate risk groups (14% of recurred and 4% of non-recurred
cases). The actuarial 7-year RFS rate for patients in the lowest risk quartile was 95%
(95% C.I.: 89%, 100%).
To evaluate an alternative approach, biological knowledge was incorporated into the
process of model development. Model building began with the use of biological
expertise to divide the variables into substantive biomarker sets on the basis of
presumed role in the pathway to cancer progression. For each biomarker family, an
informative and parsimonious index was generated by combining family variables, to
be offered to the final model as intermediate predictor. In comparison with NPI,
patients into more appropriate risk groups (21% of recurred and 11% of non-recurred
patients). This model identified a low-risk group with 7-year RFS rate at 98% (95%
C.I.: 96%, 100%)