34,278 research outputs found
Interpreting Deep Visual Representations via Network Dissection
The success of recent deep convolutional neural networks (CNNs) depends on
learning hidden representations that can summarize the important factors of
variation behind the data. However, CNNs often criticized as being black boxes
that lack interpretability, since they have millions of unexplained model
parameters. In this work, we describe Network Dissection, a method that
interprets networks by providing labels for the units of their deep visual
representations. The proposed method quantifies the interpretability of CNN
representations by evaluating the alignment between individual hidden units and
a set of visual semantic concepts. By identifying the best alignments, units
are given human interpretable labels across a range of objects, parts, scenes,
textures, materials, and colors. The method reveals that deep representations
are more transparent and interpretable than expected: we find that
representations are significantly more interpretable than they would be under a
random equivalently powerful basis. We apply the method to interpret and
compare the latent representations of various network architectures trained to
solve different supervised and self-supervised training tasks. We then examine
factors affecting the network interpretability such as the number of the
training iterations, regularizations, different initializations, and the
network depth and width. Finally we show that the interpreted units can be used
to provide explicit explanations of a prediction given by a CNN for an image.
Our results highlight that interpretability is an important property of deep
neural networks that provides new insights into their hierarchical structure.Comment: *B. Zhou and D. Bau contributed equally to this work. 15 pages, 27
figure
Multi-Source Neural Variational Inference
Learning from multiple sources of information is an important problem in
machine-learning research. The key challenges are learning representations and
formulating inference methods that take into account the complementarity and
redundancy of various information sources. In this paper we formulate a
variational autoencoder based multi-source learning framework in which each
encoder is conditioned on a different information source. This allows us to
relate the sources via the shared latent variables by computing divergence
measures between individual source's posterior approximations. We explore a
variety of options to learn these encoders and to integrate the beliefs they
compute into a consistent posterior approximation. We visualise learned beliefs
on a toy dataset and evaluate our methods for learning shared representations
and structured output prediction, showing trade-offs of learning separate
encoders for each information source. Furthermore, we demonstrate how conflict
detection and redundancy can increase robustness of inference in a multi-source
setting.Comment: AAAI 2019, Association for the Advancement of Artificial Intelligence
(AAAI) 201
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes
While deep convolutional neural networks (CNN) have been successfully applied
for 2D image analysis, it is still challenging to apply them to 3D anisotropic
volumes, especially when the within-slice resolution is much higher than the
between-slice resolution and when the amount of 3D volumes is relatively small.
On one hand, direct learning of CNN with 3D convolution kernels suffers from
the lack of data and likely ends up with poor generalization; insufficient GPU
memory limits the model size or representational power. On the other hand,
applying 2D CNN with generalizable features to 2D slices ignores between-slice
information. Coupling 2D network with LSTM to further handle the between-slice
information is not optimal due to the difficulty in LSTM learning. To overcome
the above challenges, we propose a 3D Anisotropic Hybrid Network (AH-Net) that
transfers convolutional features learned from 2D images to 3D anisotropic
volumes. Such a transfer inherits the desired strong generalization capability
for within-slice information while naturally exploiting between-slice
information for more effective modelling. The focal loss is further utilized
for more effective end-to-end learning. We experiment with the proposed 3D
AH-Net on two different medical image analysis tasks, namely lesion detection
from a Digital Breast Tomosynthesis volume, and liver and liver tumor
segmentation from a Computed Tomography volume and obtain the state-of-the-art
results
- âŠ