6 research outputs found

    Automated estimation of disease recurrence in head and neck cancer using routine healthcare data

    Get PDF
    Background Overall survival (OS) and progression free survival (PFS) are key outcome measures for head and neck cancer as they reflect treatment efficacy, and have implications for patients and health services. The UK has recently developed a series of national cancer audits which aim to estimate survival and recurrence by relying on institutions manually submitting interval data on patient status, a labour-intensive method. However, nationally, data are routinely collected on hospital admissions, surgery, radiotherapy and chemotherapy. We have developed a technique to automate the interpretation of these routine datasets, allowing us to derive patterns of treatment in head and neck cancer patients from routinely acquired data. Methods We identified 122 patients with head and neck cancer and extracted treatment histories from hospital notes to provide a gold standard dataset. We obtained routinely collected local data on inpatient admission and procedures, chemotherapy and radiotherapy for these patients and analysed them with a computer algorithm which identified relevant time points and then calculated OS and PFS. We validated these by comparison with the gold standard dataset. The algorithm was then optimised to maximise correct identification of each timepoint, and minimise false identification of recurrence events. Results Of the 122 patients, 82% had locally advanced disease. OS was 88% at 1 yr and 77% at 2 yrs and PFS was 75% and 66% at 1 and 2 yrs. 40 patients developed recurrent disease. Our automated method provided an estimated OS of 87% and 77% and PFS of 87% and 78% at 1 and 2 yrs; 98% and 82% of patients showed good agreement between the automated technique and Gold standard dataset of OS and PFS respectively (ratio of Gold standard to routine intervals of between 0.8–1.2). The automated technique correctly assigned recurrence in 101 out of 122 (83%) of the patients: 21 of the 40 patients with recurrent disease were correctly identified, 19 were too unwell to receive further treatment and were missed. Of the 82 patients who did not develop a recurrence, 77 were correctly identified and 2 were incorrectly identified as having recurrent disease when they did not. Conclusions We have demonstrated that our algorithm can be used to automate the interpretation of routine datasets to extract survival information for this sample of patients. It currently underestimates recurrence rates due to many patients not being well-enough to be treated for recurrent disease. With some further optimisation, this technique could be extended to a national level, providing a new approach to measuring outcomes on a larger scale than is currently possible. This could have implications for healthcare provision and policy for a range of different disease types

    Can routinely collected data be used to inform randomised controlled trial outcomes in oncology?

    Get PDF
    Introduction: Randomised controlled trials (RCT) have supplemented standard data collection with routine healthcare data. However, no RCTs in the United Kingdom have been conducted solely using routine data in oncology or secondary care. This thesis was undertaken to assess methods to enable the replacement or supplementation of standard RCT data. I present examples of routine data follow-up in two clinical settings: prostate and bladder cancer. Methods: Routine healthcare datasets were validated against reference patient data (for example, trial data and clinical noting), for their ability to identify trial outcomes of interest. Models were developed to algorithmically identify these outcomes from the routine data. Outcomes included: toxicity (serious adverse events), disease progression, treatments and the last known follow-up interaction. Results: Models were developed enabling the identification of outcomes of interest from the routine data, for example sepsis admissions and trial non-survival endpoints, for example, progression. This enabled the estimation of uncollected trial case report form (CRF) events, which subsequently have become of interest. I developed a novel routine data-derived endpoint, which correlated with standard trial endpoints, enabling estimation of treatment effects from routine data. I also developed a method to validate the feasibility of using routine data as the basis for oncology trial follow-up. Discussion: The nature of the routine data meant that models had to be developed to enable identification of some events of interest indirectly. Although routine data quality was shown to be improving, techniques had to be implemented, for example, through data querying, to ensure integrity, accuracy and relevance. Routine data can provide a robust method of trial data collection but needs to be used in combination with other data sources, such as, standard trial data or clinical noting. Conclusion: I propose that routine data are a feasible source of trial outcomes; however, each individual outcome requires validation

    Using machine learning to understand and improve care and outcomes for patients with head and neck cancer

    Full text link
    Head and neck cancer (HNC) is a complex disease with diversity in treatment modality and survival by anatomical site of origin. There is limited knowledge of the utility of oncology information systems (OIS) for the collection and reporting of HNC data during routine clinical practice to investigate prognostic factors and predict head and neck cancer-specific survival (HNCSS). Routinely collected structured data was extracted from an OIS from seven major hospitals in Australia for patients diagnosed with HNC between 2000 and 2017 and treated with definitive radiotherapy. Deaths were obtained from the National Death Index via record linkage, and HNCSS was measured from the date of diagnosis until death from HNC. Open-source machine learning and nomogram models were used to predict HNCSS and perform multivariable analysis to identify prognostic factors. Descriptive and survival analysis was used to identify inter-hospital variation in data collection, primary radiotherapy treatment, and survival. A random sample of clinical radiation oncology documents from an OIS were anonymised using a customised open-source tool (Microsoft Presidio) to evaluate the use of unstructured information for medical research. Not all user-defined fields were routinely completed and not all hospitals relied solely on the OIS, with one hospital collecting disease information in a parallel database. However, structured information collected in a standardised way with minimal missing data during routine clinical practice in an OIS can be used to predict two-year HNCSS with high performance. Evidence of inter-hospital variation in data completeness, primary radiotherapy dose, and five-year HNCSS was detected. The presence of missing data in the OIS reduced the number of predictors for prognostic analysis and prevented exploratory analysis to explain differences in survival by hospital. Lastly, the application of the anonymisation tool on unstructured clinical information sourced from an OIS demonstrated safe and secure use for some fields and a need to improve the detection and removal of person names. Data mining techniques for unstructured data or strategies to improve structured data collection should be explored to enable the development of prediction models using more complete data, patients, and variables, followed by external validation to confirm model performance
    corecore