Inst för medicinsk epidemiologi och biostatistik / Dept of Medical Epidemiology and Biostatistics
Doi
Abstract
In recent years, there have been rapid advancements in the field of computational
pathology. This has been enabled through the adoption of digital pathology
workflows that generate digital images of histopathological slides, the publication
of large data sets of these images and improvements in computing infrastructure.
Objectives in computational pathology can be subdivided into two categories,
first the automation of routine workflows that would otherwise be performed by
pathologists and second the addition of novel capabilities. This thesis focuses on
the development, application, and evaluation of methods in this second category,
specifically the prediction of gene expression from pathology images and the
registration of pathology images among each other.
In Study I, we developed a computationally efficient cluster-based technique to
perform transcriptome-wide predictions of gene expression in prostate cancer
from H&E-stained whole-slide-images (WSIs). The suggested method
outperforms several baseline methods and is non-inferior to single-gene CNN
predictions, while reducing the computational cost with a factor of approximately
300. We included 15,586 transcripts that encode proteins in the analysis and
predicted their expression with different modelling approaches from the WSIs. In
a cross-validation, 6,618 of these predictions were significantly associated with
the RNA-seq expression estimates with FDR-adjusted p-values <0.001. Upon
validation of these 6,618 expression predictions in a held-out test set, the
association could be confirmed for 5,419 (81.9%). Furthermore, we demonstrated
that it is feasible to predict the prognostic cell-cycle progression score with a
Spearman correlation to the RNA-seq score of 0.527 [0.357, 0.665].
The objective of Study II is the investigation of attention layers in the context of
multiple-instance-learning for regression tasks, exemplified by a simulation study
and gene expression prediction. We find that for gene expression prediction, the
compared methods are not distinguishable regarding their performance, which
indicates that attention mechanisms may not be superior to weakly supervised
learning in this context.
Study III describes the results of the ACROBAT 2022 WSI registration challenge,
which we organised in conjunction with the MICCAI 2022 conference. Participating
teams were ranked on the median 90th percentile of distances between
registered and annotated target landmarks. Median 90th percentiles for eight
teams that were eligible for ranking in the test set consisting of 303 WSI pairs
ranged from 60.1 µm to 15,938.0 µm. The best performing method therefore has a
score slightly below the median 90th percentile of distances between first and
second annotator of 67.0 µm.
Study IV describes the data set that we published to facilitate the ACROBAT
challenge. The data set is available publicly through the Swedish National Data
Service SND and consists of 4,212 WSIs from 1,153 breast cancer patients.
Study V is an example of the application of WSI registration for computational
pathology. In this study, we investigate the possibility to register invasive cancer
annotations from H&E to KI67 WSIs and then subsequently train cancer detection
models. To this end, we compare the performance of models optimised with
registered annotations to the performance of models that were optimised with
annotations generated for the KI67 WSIs. The data set consists of 272 female
breast cancer cases, including an internal test set of 54 cases. We find that in this
test set, the performance of both models is not distinguishable regarding
performance, while there are small differences in model calibration