Methodology for taking a computer-aided breast cancer screening system from the laboratory to the marketplace

Abstract

Breast cancer is one of the most common causes of death in women, and yet is one of the more 'curable' cancers if caught early. Since its inception in 1987, the Breast Screening Programme has been the principal tool in the National Health Service's fight to reduce the number of cancer related deaths in the UK. Breast screening using mammography is widely viewed as the most effective way of detecting early breast cancer, with the UK population of women over the age of 50 being invited to a screening session every three years. However, national shortages of clinical staff willing to enter and remain in this field mean that the NHS Breast Screening Programme is severely understaffed. This thesis discusses one way in which technology can assist in the screening programme; specifically, the use of a computer-aided cancer detection system. Here, we will present the design and analysis of a sequence of experiments used to develop and evaluate such a system. PROMAM (PROmpting for MAMmography) involved the scanning and digitising of mammograms, and the subsequent analysis of the digital image by a series of algorithms. Initial evaluation was done to ensure that the algorithms were performing satisfactorily at a technical level before being introduced into a clinical setting. Two large experiments with the algorithms were designed and evaluated: 1. offering radiologists three levels of algorithm prompting and, as a control, an unprompted level, on samples of mammographic films, with outcomes being their recall rate and subjective views at each prompting level, 2. a pre-clinical experiment, conducted under semi-clinical conditions, where two readers would see a batch of films seeded with higher than normal numbers of cancers, with readers allocated randomly to prompted and unprompted views of films. The first experiment was designed using a Graeco-Latin Square, with three 'nuisance' variables and the treatment factor of prompting levels (no prompts, low level of prompt¬ ing, medium and high). Four radiologists read at each level of prompting once, on dif¬ ferent sets of films. One of the more interesting results was that the recall rate did not increase as the prompting rate rose - contrary to prior expectations. Most of the differ¬ ences seen between the prompting rates could be explained as radiologist differences. Once these were taken into account, the level of prompting had little effect. Addition¬ ally, although the time taken to read a set of films increased as the prompting rate increased (as would be expected), it was only an increase of 26% from the unprompted set to the set with the highest number of prompts. Observational data suggested that the lowest level of prompting was not maintaining the interest of the radiologist, thus leading them to neglect the prompts. The following experiment moved the system a step closer to a true clinical demonstra¬ tion of the efficacy of PROMAM, being conducted under semi-clinical conditions. Using a method of minimisation, the number of cancers each radiologist viewed as first reader, second reader, prompted or unprompted were balanced. Preliminary exploratory anal¬ ysis indicated that the recall rate declined with the introduction of the prompting system, but more detailed, analysis indicated that much of this difference was due to a radiologist effect. Although cancer detection was slightly lower with the prompting system, examination of the 11 cancers missed by the prompted radiologist showed that six of these had been correctly prompted by the algorithms. This demonstrated scope to improve the cancer detection rate by nearly 5%. These experiments determined the 'production' version of the prompting system. A design to evaluate the system in a sample of 100,000 women in six centres was produced, but due to circumstances beyond the project team's control, it was not possible to take this work to the stage of a full 'trial' of the system. The design concept can, however, apply to the evaluation of any similar prompting system. The recommended design is therefore presented, together with an analysis of data from a simulated application of this design. This simulation has allowed recommendations to be made on the most appropriate ways to analyse the extensive and complicated dataset that will be obtained. In particular, it identified technical problems that can arise from the application on one candidate analytical method, and an explanation for the failure obtained It is quite clear from the evidence presented in this thesis that there is much scope for improvement in the cancer detection rate by the use of a prompting system, with¬ out a corresponding loss in the specificity. With the shortage of radiologists and ra¬ diographers, and the increasing demand placed on the Breast Screening Programme, technology could play a beneficial role in screening for breast cancer in the coming year

    Similar works