103 research outputs found
Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology
In the field of computational pathology, the use of decision support systems
powered by state-of-the-art deep learning solutions has been hampered by the
lack of large labeled datasets. Until recently, studies relied on datasets in
the order of few hundreds of slides which are not enough to train a model that
can work at scale in the clinic. Here, we have gathered a dataset consisting of
12,160 slides, two orders of magnitude larger than previous datasets in
pathology and equivalent to 25 times the pixel count of the entire ImageNet
dataset. Given the size of our dataset it is possible for us to train a deep
learning model under the Multiple Instance Learning (MIL) assumption where only
the overall slide diagnosis is necessary for training, avoiding all the
expensive pixel-wise annotations that are usually part of supervised learning
approaches. We test our framework on a complex task, that of prostate cancer
diagnosis on needle biopsies. We performed a thorough evaluation of the
performance of our MIL pipeline under several conditions achieving an AUC of
0.98 on a held-out test set of 1,824 slides. These results open the way for
training accurate diagnosis prediction models at scale, laying the foundation
for decision support system deployment in the clinic
Self-Supervised Similarity Learning for Digital Pathology
Using features extracted from networks pretrained on ImageNet is a common
practice in applications of deep learning for digital pathology. However it
presents the downside of missing domain specific image information. In digital
pathology, supervised training data is expensive and difficult to collect. We
propose a self-supervised method for feature extraction by similarity learning
on whole slide images (WSI) that is simple to implement and allows creation of
robust and compact image descriptors. We train a siamese network, exploiting
image spatial continuity and assuming spatially adjacent tiles in the image are
more similar to each other than distant tiles. Our network outputs feature
vectors of length 128, which allows dramatically lower memory storage and
faster processing than networks pretrained on ImageNet. We apply the method on
digital pathology WSIs from the Camelyon16 train set and assess and compare our
method by measuring image retrieval of tumor tiles and descriptor pair distance
ratio for distant/near tiles in the Camelyon16 test set. We show that our
method yields better retrieval task results than existing ImageNet based and
generic self-supervised feature extraction methods. To the best of our
knowledge, this is also the first published method for self-supervised learning
tailored for digital pathology
Weakly supervised training of pixel resolution segmentation models on whole slide images
We present a novel approach to train pixel resolution segmentation models on
whole slide images in a weakly supervised setup. The model is trained to
classify patches extracted from slides. This leads the training to be made
under noisy labeled data. We solve the problem with two complementary
strategies. First, the patches are sampled online using the model's knowledge
by focusing on regions where the model's confidence is higher. Second, we
propose an extension of the KL divergence that is robust to noisy labels. Our
preliminary experiment on CAMELYON 16 data set show promising results. The
model can successfully segment tumor areas with strong morphological
consistency.Comment: Performance updat
Monte-Carlo Sampling applied to Multiple Instance Learning for Histological Image Classification
We propose a patch sampling strategy based on a sequential Monte-Carlo method
for high resolution image classification in the context of Multiple Instance
Learning. When compared with grid sampling and uniform sampling techniques, it
achieves higher generalization performance. We validate the strategy on two
artificial datasets and two histological datasets for breast cancer and sun
exposure classification.Comment: accepted at 4th International Workshop on Deep Learning for Medical
Image Analysis (DLMIA), MICCAI 2018, Deep Learning in Medical Image Analysis
and Multimodal Learning for Clinical Decision Support, Springer International
Publishing, 201
Coupling weak and strong supervision for classification of prostate cancer histopathology images
Automated grading of prostate cancer histopathology images is a challenging
task, with one key challenge being the scarcity of annotations down to the
level of regions of interest (strong labels), as typically the prostate cancer
Gleason score is known only for entire tissue slides (weak labels). In this
study, we focus on automated Gleason score assignment of prostate cancer
whole-slide images on the basis of a large weakly-labeled dataset and a smaller
strongly-labeled one. We efficiently leverage information from both label
sources by jointly training a classifier on the two datasets and by introducing
a gradient update scheme that assigns different relative importances to each
training example, as a means of self-controlling the weak supervision signal.
Our approach achieves superior performance when compared with standard Gleason
scoring methods.Comment: Accepted in Medical Imaging meets NIPS Workshop, NIPS 201
Segmenting Potentially Cancerous Areas in Prostate Biopsies using Semi-Automatically Annotated Data
Gleason grading specified in ISUP 2014 is the clinical standard in staging
prostate cancer and the most important part of the treatment decision. However,
the grading is subjective and suffers from high intra and inter-user
variability. To improve the consistency and objectivity in the grading, we
introduced glandular tissue WithOut Basal cells (WOB) as the ground truth. The
presence of basal cells is the most accepted biomarker for benign glandular
tissue and the absence of basal cells is a strong indicator of acinar prostatic
adenocarcinoma, the most common form of prostate cancer. Glandular tissue can
objectively be assessed as WOB or not WOB by using specific immunostaining for
glandular tissue (Cytokeratin 8/18) and for basal cells (Cytokeratin 5/6 +
p63). Even more, WOB allowed us to develop a semi-automated data generation
pipeline to speed up the tremendously time consuming and expensive process of
annotating whole slide images by pathologists. We generated 295 prostatectomy
images exhaustively annotated with WOB. Then we used our Deep Learning
Framework, which achieved the best reported score in Camelyon17
Challenge, to train networks for segmenting WOB in needle biopsies. Evaluation
of the model on 63 needle biopsies showed promising results which were improved
further by finetuning the model on 118 biopsies annotated with WOB, achieving
F1-score of 0.80 and Precision-Recall AUC of 0.89 at the pixel-level. Then we
compared the performance of the model against 17 biopsies annotated
independently by 3 pathologists using only H\&E staining. The comparison
demonstrated that the model performed on a par with the pathologists. Finally,
the model detected and accurately outlined existing WOB areas in two biopsies
incorrectly annotated as totally WOB-free biopsies by three pathologists and in
one biopsy by two pathologists.Comment: Accepted as oral presentation at Medical Imaging with Deep Learning
(MIDL) 2019, July, London, Englan
Similar Image Search for Histopathology: SMILY
The increasing availability of large institutional and public histopathology
image datasets is enabling the searching of these datasets for diagnosis,
research, and education. Though these datasets typically have associated
metadata such as diagnosis or clinical notes, even carefully curated datasets
rarely contain annotations of the location of regions of interest on each
image. Because pathology images are extremely large (up to 100,000 pixels in
each dimension), further laborious visual search of each image may be needed to
find the feature of interest. In this paper, we introduce a deep learning based
reverse image search tool for histopathology images: Similar Medical Images
Like Yours (SMILY). We assessed SMILY's ability to retrieve search results in
two ways: using pathologist-provided annotations, and via prospective studies
where pathologists evaluated the quality of SMILY search results. As a negative
control in the second evaluation, pathologists were blinded to whether search
results were retrieved by SMILY or randomly. In both types of assessments,
SMILY was able to retrieve search results with similar histologic features,
organ site, and prostate cancer Gleason grade compared with the original query.
SMILY may be a useful general-purpose tool in the pathologist's arsenal, to
improve the efficiency of searching large archives of histopathology images,
without the need to develop and implement specific tools for each application.Comment: 23 Pages with 6 figures and 3 tables. The file also has 6 pages of
supplemental material. Improved figure resolution, edited metadat
Extracting 2D weak labels from volume labels using multiple instance learning in CT hemorrhage detection
Multiple instance learning (MIL) is a supervised learning methodology that
aims to allow models to learn instance class labels from bag class labels,
where a bag is defined to contain multiple instances. MIL is gaining traction
for learning from weak labels but has not been widely applied to 3D medical
imaging. MIL is well-suited to clinical CT acquisitions since (1) the highly
anisotropic voxels hinder application of traditional 3D networks and (2)
patch-based networks have limited ability to learn whole volume labels. In this
work, we apply MIL with a deep convolutional neural network to identify whether
clinical CT head image volumes possess one or more large hemorrhages (>
20cm), resulting in a learned 2D model without the need for 2D slice
annotations. Individual image volumes are considered separate bags, and the
slices in each volume are instances. Such a framework sets the stage for
incorporating information obtained in clinical reports to help train a 2D
segmentation approach. Within this context, we evaluate the data requirements
to enable generalization of MIL by varying the amount of training data. Our
results show that a training size of at least 400 patient image volumes was
needed to achieve accurate per-slice hemorrhage detection. Over a five-fold
cross-validation, the leading model, which made use of the maximum number of
training volumes, had an average true positive rate of 98.10%, an average true
negative rate of 99.36%, and an average precision of 0.9698. The models have
been made available along with source code to enabled continued exploration and
adaption of MIL in CT neuroimaging
Certainty Pooling for Multiple Instance Learning
Multiple Instance Learning is a form of weakly supervised learning in which
the data is arranged in sets of instances called bags with one label assigned
per bag. The bag level class prediction is derived from the multiple instances
through application of a permutation invariant pooling operator on instance
predictions or embeddings. We present a novel pooling operator called
\textbf{Certainty Pooling} which incorporates the model certainty into bag
predictions resulting in a more robust and explainable model. We compare our
proposed method with other pooling operators in controlled experiments with low
evidence ratio bags based on MNIST, as well as on a real life histopathology
dataset - Camelyon16. Our method outperforms other methods in both bag level
and instance level prediction, especially when only small training sets are
available. We discuss the rationale behind our approach and the reasons for its
superiority for these types of datasets
Magnifying Networks for Images with Billions of Pixels
The shift towards end-to-end deep learning has brought unprecedented advances
in many areas of computer vision. However, there are cases where the input
images are excessively large, deeming end-to-end approaches impossible. In this
paper, we introduce a new network, the Magnifying Network (MagNet), which can
be trained end-to-end independently of the input image size. MagNets combine
convolutional neural networks with differentiable spatial transformers, in a
new way, to navigate and successfully learn from images with billions of
pixels. Drawing inspiration from the magnifying nature of an ordinary
brightfield microscope, a MagNet processes a downsampled version of an image,
and without supervision learns how to identify areas that may carry value to
the task at hand, upsamples them, and recursively repeats this process on each
of the extracted patches. Our results on the publicly available Camelyon16 and
Camelyon17 datasets first corroborate to the effectiveness of MagNets and the
proposed optimization framework and second, demonstrate the advantage of
Magnets' built-in transparency, an attribute of utmost importance for critical
processes such as medical diagnosis
- …