8 research outputs found
Concept Saliency Maps to Visualize Relevant Features in Deep Generative Models
Evaluating, explaining, and visualizing high-level concepts in generative
models, such as variational autoencoders (VAEs), is challenging in part due to
a lack of known prediction classes that are required to generate saliency maps
in supervised learning. While saliency maps may help identify relevant features
(e.g., pixels) in the input for classification tasks of deep neural networks,
similar frameworks are understudied in unsupervised learning. Therefore, we
introduce a new method of obtaining saliency maps for latent representations of
known or novel high-level concepts, often called concept vectors in generative
models. Concept scores, analogous to class scores in classification tasks, are
defined as dot products between concept vectors and encoded input data, which
can be readily used to compute the gradients. The resulting concept saliency
maps are shown to highlight input features deemed important for high-level
concepts. Our method is applied to the VAE's latent space of CelebA dataset in
which known attributes such as "smiles" and "hats" are used to elucidate
relevant facial features. Furthermore, our application to spatial
transcriptomic (ST) data of a mouse olfactory bulb demonstrates the potential
of latent representations of morphological layers and molecular features in
advancing our understanding of complex biological systems. By extending the
popular method of saliency maps to generative models, the proposed concept
saliency maps help improve interpretability of latent variable models in deep
learning.
Codes to reproduce and to implement concept saliency maps:
https://github.com/lenbrocki/concept-saliency-mapsComment: 18th IEEE International Conference on Machine Learning and
Applications (ICMLA
Delving into Inter-Image Invariance for Unsupervised Visual Representations
Contrastive learning has recently shown immense potential in unsupervised
visual representation learning. Existing studies in this track mainly focus on
intra-image invariance learning. The learning typically uses rich intra-image
transformations to construct positive pairs and then maximizes agreement using
a contrastive loss. The merits of inter-image invariance, conversely, remain
much less explored. One major obstacle to exploit inter-image invariance is
that it is unclear how to reliably construct inter-image positive pairs, and
further derive effective supervision from them since there are no pair
annotations available. In this work, we present a rigorous and comprehensive
study on inter-image invariance learning from three main constituting
components: pseudo-label maintenance, sampling strategy, and decision boundary
design. Through carefully-designed comparisons and analysis, we propose a
unified and generic framework that supports the integration of unsupervised
intra- and inter-image invariance learning. With all the obtained recipes, our
final model, namely InterCLR, shows consistent improvements over
state-of-the-art intra-image invariance learning methods on multiple standard
benchmarks. Codes will be released at
https://github.com/open-mmlab/OpenSelfSup
Visual attribute discovery and analyses from Web data
Visual attributes are important for describing and understanding an object’s appearance. For an object classification or recognition task, an algorithm needs to infer the visual attributes of an object to compare, categorize or recognize the objects. In a zero-shot learning scenario, the algorithm depends on the visual attributes to describe an unknown object since the training samples are not available. Because different object categories usually share some common attributes (e.g. many animals have four legs, a tail and fur), the act of explicitly modeling attributes not only allows previously learnt attributes to be transferred to a novel category but also reduces the number of training samples for the new category which can be important when the number of training samples is limited. Even though larger numbers of visual attributes help the algorithm to better describe an image, they also require a larger set of training data. In the supervised scenario, data collection can be both a costly and time-consuming process. To mitigate the data collection costs, this dissertation exploits the weakly-supervised data from the Web in order to construct computational methodologies for the discovery of visual attributes, as well as an analysis across time and domains. This dissertation first presents an automatic approach to learning hundreds of visual attributes from the open-world vocabulary on the Web using a convolutional neural network. The proposed method tries to understand visual attributes in terms of perception inside deep neural networks. By focusing on the analysis of neural activations, the system can identify the degree to which an attribute can be visually perceptible and can localize the visual attributes in an image. Moreover, the approach exploits the layered structure of the deep model to determine the semantic depth of the attributes. Beyond visual attribute discovery, this dissertation explores how visual styles (i.e., attributes that correspond to multiple visual concepts) change across time. These are referred to as visual trends. To this goal, this dissertation introduces several deep neural networks for estimating when objects were made together with the analyses of the neural activations and their degree of entropy to gain insights into the deep network. To utilize the dating of the historical object frameworks in real-world applications, the dating frameworks are applied to analyze the influence of vintage fashion on runway collections, as well as to analyze the influence of fashion on runway collections and on street fashion. Finally, this dissertation introduces an approach to recognizing and transferring visual attributes across domains in a realistic manner. Given two input images from two different domains: 1) a shopping image, and 2) a scene image, this dissertation proposes a generative adversarial network for transferring the product pixels from the shopping image to the scene image such that: 1) the output image looks realistic and 2) the visual attributes of the product are preserved. In summary, this dissertation utilizes the weakly-supervised data from the Web for the purposes of visual attribute discovery and an analysis across time and domains. Beyond the novel computational methodology for each problem, this dissertation demonstrates that the proposed approaches can be applied to many real-world applications such as dating historical objects, visual trend prediction and analysis, cross-domain image label transfer, cross-domain pixel transfer for home decoration, among others.Doctor of Philosoph