4 research outputs found
KimiaNet: Training a Deep Network for Histopathology using High-Cellularity
With the recent progress in deep learning, one of the common approaches to represent images is extracting deep features. A primitive way to do this is by using off-the-shelf models. However, these features could be improved through fine-tuning or even training a network from scratch by domain-specific images. This desirable task is hindered by the lack of annotated or labeled images in the field of histopathology.
In this thesis, a new network, namely KimiaNet, is proposed that uses an existing dense topology but is tailored for generating informative and discriminative deep features from histopathology images for image representation. This model is trained based on the existing DenseNet-121 architecture but by using more than 240,000 image patches of 1000 ⨉ 1000 pixels acquired at 20⨉ magnification.
Considering the high cost of histopathology image annotation, which makes the idea impractical at a large scale, a high-cellularity mosaic approach is suggested which could be used as a weak or soft labeling method. Patches used for training the KimiaNet are extracted from 7,126 whole slide images of formalin-fixed paraffin-embedded (FFPE) biopsy samples, spanning 30 cancer sub-types and publicly available through The Cancer Genome Atlas (TCGA) repository.
The quality of features generated by KimiaNet are tested via two types of image search, (i) given a query slide, searching among all of the slides and finding the ones with the tissue type similar to the query’s and (ii) searching among slides within the query slide’s tumor type and finding slides with the same cancer sub-type as the query slide’s. Compared to the pre-trained DenseNet-121 and the fine-tuned versions, KimiaNet achieved predominantly the best results for both search modes.
In order to get an intuition of how effective training from scratch is on the expressiveness of the deep features, the deep features of randomly selected patches, from each cancer subtype, are extracted using both KimiaNet and pre-trained DenseNet-121 and visualized after reducing their dimensionality using t-distributed Stochastic Neighbor Embedding (tSNE). This visualization illustrates that for KimiaNet, the instances of each class can easily be distinguished from others while for pre-trained DenseNet the instances of almost all of the classes are mixed together. This comparison is another verification to show that how discriminative training with domain-specific images has made the features.
Also, four simpler networks, made up of repetitions of convolutional, batch-normalization and Rectified Linear Unit (ReLU) layers, (CBR networks) are implemented and compared against the KimiaNet to check if the network design could still be further simplified. The experiments demonstrated that KimiaNet features are by far better than CBR networks which validate the DenseNet-121 as a good candidate for KimiaNet’s architecture
Evolutionary Computation in Action: Feature Selection for Deep Embedding Spaces of Gigapixel Pathology Images
One of the main obstacles of adopting digital pathology is the challenge of
efficient processing of hyperdimensional digitized biopsy samples, called whole
slide images (WSIs). Exploiting deep learning and introducing compact WSI
representations are urgently needed to accelerate image analysis and facilitate
the visualization and interpretability of pathology results in a postpandemic
world. In this paper, we introduce a new evolutionary approach for WSI
representation based on large-scale multi-objective optimization (LSMOP) of
deep embeddings. We start with patch-based sampling to feed KimiaNet , a
histopathology-specialized deep network, and to extract a multitude of feature
vectors. Coarse multi-objective feature selection uses the reduced search space
strategy guided by the classification accuracy and the number of features. In
the second stage, the frequent features histogram (FFH), a novel WSI
representation, is constructed by multiple runs of coarse LSMOP. Fine
evolutionary feature selection is then applied to find a compact (short-length)
feature vector based on the FFH and contributes to a more robust deep-learning
approach to digital pathology supported by the stochastic power of evolutionary
algorithms. We validate the proposed schemes using The Cancer Genome Atlas
(TCGA) images in terms of WSI representation, classification accuracy, and
feature quality. Furthermore, a novel decision space for multicriteria decision
making in the LSMOP field is introduced. Finally, a patch-level visualization
approach is proposed to increase the interpretability of deep features. The
proposed evolutionary algorithm finds a very compact feature vector to
represent a WSI (almost 14,000 times smaller than the original feature vectors)
with 8% higher accuracy compared to the codes provided by the state-of-the-art
methods
Biased data, biased AI: deep networks predict the acquisition site of TCGA images
Abstract Background Deep learning models applied to healthcare applications including digital pathology have been increasing their scope and importance in recent years. Many of these models have been trained on The Cancer Genome Atlas (TCGA) atlas of digital images, or use it as a validation source. One crucial factor that seems to have been widely ignored is the internal bias that originates from the institutions that contributed WSIs to the TCGA dataset, and its effects on models trained on this dataset. Methods 8,579 paraffin-embedded, hematoxylin and eosin stained, digital slides were selected from the TCGA dataset. More than 140 medical institutions (acquisition sites) contributed to this dataset. Two deep neural networks (DenseNet121 and KimiaNet were used to extract deep features at 20× magnification. DenseNet was pre-trained on non-medical objects. KimiaNet has the same structure but trained for cancer type classification on TCGA images. The extracted deep features were later used to detect each slide’s acquisition site, and also for slide representation in image search. Results DenseNet’s deep features could distinguish acquisition sites with 70% accuracy whereas KimiaNet’s deep features could reveal acquisition sites with more than 86% accuracy. These findings suggest that there are acquisition site specific patterns that could be picked up by deep neural networks. It has also been shown that these medically irrelevant patterns can interfere with other applications of deep learning in digital pathology, namely image search. Summary This study shows that there are acquisition site specific patterns that can be used to identify tissue acquisition sites without any explicit training. Furthermore, it was observed that a model trained for cancer subtype classification has exploited such medically irrelevant patterns to classify cancer types. Digital scanner configuration and noise, tissue stain variation and artifacts, and source site patient demographics are among factors that likely account for the observed bias. Therefore, researchers should be cautious of such bias when using histopathology datasets for developing and training deep networks