The practical application of deep learning methods in the medical domain
has many challenges. Pathologies are diverse and very few examples may
be available for rare cases. Where data is collected it may lie in multiple
institutions and cannot be pooled for practical and ethical reasons. Deep
learning is powerful for image segmentation problems but ultimately its output
must be interpretable at the patient level. Although clearly not an exhaustive
list, these are the three problems tackled in this thesis.
To address the rarity of pathology I investigate novelty detection algorithms
to find outliers from normal anatomy. The problem is structured as first finding
a low-dimension embedding and then detecting outliers in that embedding
space. I evaluate for speed and accuracy several unsupervised embedding and
outlier detection methods. Data consist of Magnetic Resonance Imaging (MRI)
for interstitial lung disease for which healthy and pathological patches are
available; only the healthy patches are used in model training.
I then explore the clinical interpretability of a model output. I take related
work by the Canon team — a model providing voxel-level detection of acute
ischemic stroke signs — and deliver the Alberta Stroke Programme Early CT
Score (ASPECTS, a measure of stroke severity). The data are acute head
computed tomography volumes of suspected stroke patients. I convert from
the voxel level to the brain region level and then to the patient level through a
series of rules. Due to the real world clinical complexity of the problem, there
are at each level — voxel, region and patient — multiple sources of “truth”; I
evaluate my results appropriately against these truths.
Finally, federated learning is used to train a model on data that are divided
between multiple institutions. I introduce a novel evolution of this algorithm
— dubbed “soft federated learning” — that avoids the central coordinating
authority, and takes into account domain shift (covariate shift) and dataset
size. I first demonstrate the key properties of these two algorithms on a series
of MNIST (handwritten digits) toy problems. Then I apply the methods to the
BraTS medical dataset, which contains MRI brain glioma scans from multiple
institutions, to compare these algorithms in a realistic setting