248 research outputs found
Grounding deep models of visual data
Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this thesis, we study: 1) Improving Model Classification. We show that by utilizing web action images along with videos in training for action recognition, significant performance boosts of convolutional models can be achieved. Without explicit grounding, labeled web action images tend to contain discriminative action poses, which highlight discriminative portions of a video’s temporal progression. 2) Spatial Grounding. We visualize spatial evidence of deep model predictions using a discriminative top-down attention mechanism, called Excitation Backprop. We show how such visualizations are equally informative for correct and incorrect model predictions, and highlight the shift of focus when different training strategies are adopted. 3) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 4) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidenceprediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 5) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network’s classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks
RadFormer: Transformers with Global-Local Attention for Interpretable and Accurate Gallbladder Cancer Detection
We propose a novel deep neural network architecture to learn interpretable
representation for medical image analysis. Our architecture generates a global
attention for region of interest, and then learns bag of words style deep
feature embeddings with local attention. The global, and local feature maps are
combined using a contemporary transformer architecture for highly accurate
Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our
experiments indicate that the detection accuracy of our model beats even human
radiologists, and advocates its use as the second reader for GBC diagnosis. Bag
of words embeddings allow our model to be probed for generating interpretable
explanations for GBC detection consistent with the ones reported in medical
literature. We show that the proposed model not only helps understand decisions
of neural network models but also aids in discovery of new visual features
relevant to the diagnosis of GBC. Source-code and model will be available at
https://github.com/sbasu276/RadFormerComment: To Appear in Elsevier Medical Image Analysi
MultiPathGAN: Structure Preserving Stain Normalization using Unsupervised Multi-domain Adversarial Network with Perception Loss
Histopathology relies on the analysis of microscopic tissue images to
diagnose disease. A crucial part of tissue preparation is staining whereby a
dye is used to make the salient tissue components more distinguishable.
However, differences in laboratory protocols and scanning devices result in
significant confounding appearance variation in the corresponding images. This
variation increases both human error and the inter-rater variability, as well
as hinders the performance of automatic or semi-automatic methods. In the
present paper we introduce an unsupervised adversarial network to translate
(and hence normalize) whole slide images across multiple data acquisition
domains. Our key contributions are: (i) an adversarial architecture which
learns across multiple domains with a single generator-discriminator network
using an information flow branch which optimizes for perceptual loss, and (ii)
the inclusion of an additional feature extraction network during training which
guides the transformation network to keep all the structural features in the
tissue image intact. We: (i) demonstrate the effectiveness of the proposed
method firstly on H\&E slides of 120 cases of kidney cancer, as well as (ii)
show the benefits of the approach on more general problems, such as flexible
illumination based natural image enhancement and light source adaptation
Deep Interpretability Methods for Neuroimaging
Brain dynamics are highly complex and yet hold the key to understanding brain function and dysfunction. The dynamics captured by resting-state functional magnetic resonance imaging data are noisy, high-dimensional, and not readily interpretable. The typical approach of reducing this data to low-dimensional features and focusing on the most predictive features comes with strong assumptions and can miss essential aspects of the underlying dynamics. In contrast, introspection of discriminatively trained deep learning models may uncover disorder-relevant elements of the signal at the level of individual time points and spatial locations. Nevertheless, the difficulty of reliable training on high-dimensional but small-sample datasets and the unclear relevance of the resulting predictive markers prevent the widespread use of deep learning in functional neuroimaging. In this dissertation, we address these challenges by proposing a deep learning framework to learn from high-dimensional dynamical data while maintaining stable, ecologically valid interpretations. The developed model is pre-trainable and alleviates the need to collect an enormous amount of neuroimaging samples to achieve optimal training.
We also provide a quantitative validation module, Retain and Retrain (RAR), that can objectively verify the higher predictability of the dynamics learned by the model. Results successfully demonstrate that the proposed framework enables learning the fMRI dynamics directly from small data and capturing compact, stable interpretations of features predictive of function and dysfunction. We also comprehensively reviewed deep interpretability literature in the neuroimaging domain. Our analysis reveals the ongoing trend of interpretability practices in neuroimaging studies and identifies the gaps that should be addressed for effective human-machine collaboration in this domain.
This dissertation also proposed a post hoc interpretability method, Geometrically Guided Integrated Gradients (GGIG), that leverages geometric properties of the functional space as learned by a deep learning model. With extensive experiments and quantitative validation on MNIST and ImageNet datasets, we demonstrate that GGIG outperforms integrated gradients (IG), which is considered to be a popular interpretability method in the literature. As GGIG is able to identify the contours of the discriminative regions in the input space, GGIG may be useful in various medical imaging tasks where fine-grained localization as an explanation is beneficial
- …