386 research outputs found
CVAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior
We present a self-supervised variational autoencoder (VAE) to jointly learn
disentangled and dependent hidden factors and then enhance disentangled
representation learning by a self-supervised classifier to eliminate coupled
representations in a contrastive manner. To this end, a Contrastive Copula VAE
(CVAE) is introduced without relying on prior knowledge about data in the
probabilistic principle and involving strong modeling assumptions on the
posterior in the neural architecture. CVAE simultaneously factorizes the
posterior (evidence lower bound, ELBO) with total correlation (TC)-driven
decomposition for learning factorized disentangled representations and extracts
the dependencies between hidden features by a neural Gaussian copula for copula
coupled representations. Then, a self-supervised contrastive classifier
differentiates the disentangled representations from the coupled
representations, where a contrastive loss regularizes this contrastive
classification together with the TC loss for eliminating entangled factors and
strengthening disentangled representations. CVAE demonstrates a strong
effect in enhancing disentangled representation learning. CVAE further
contributes to improved optimization addressing the TC-based VAE instability
and the trade-off between reconstruction and representation
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems
Single neurons in neural networks are often interpretable in that they
represent individual, intuitively meaningful features. However, many neurons
exhibit , i.e., they represent multiple unrelated
features. A recent hypothesis proposes that features in deep networks may be
represented in , i.e., on non-orthogonal axes by
multiple neurons, since the number of possible interpretable features in
natural data is generally larger than the number of neurons in a given network.
Accordingly, we should be able to find meaningful directions in activation
space that are not aligned with individual neurons. Here, we propose (1) an
automated method for quantifying visual interpretability that is validated
against a large database of human psychophysics judgments of neuron
interpretability, and (2) an approach for finding meaningful directions in
network activation space. We leverage these methods to discover directions in
convolutional neural networks that are more intuitively meaningful than
individual neurons, as we confirm and investigate in a series of analyses.
Moreover, we apply the same method to three recent datasets of visual neural
responses in the brain and find that our conclusions largely transfer to real
neural data, suggesting that superposition might be deployed by the brain. This
also provides a link with disentanglement and raises fundamental questions
about robust, efficient and factorized representations in both artificial and
biological neural systems
Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement
Humans use abstract concepts for understanding instead of hard features.
Recent interpretability research has focused on human-centered concept
explanations of neural networks. Concept Activation Vectors (CAVs) estimate a
model's sensitivity and possible biases to a given concept. In this paper, we
extend CAVs from post-hoc analysis to ante-hoc training in order to reduce
model bias through fine-tuning using an additional Concept Loss. Concepts were
defined on the final layer of the network in the past. We generalize it to
intermediate layers using class prototypes. This facilitates class learning in
the last convolution layer, which is known to be most informative. We also
introduce Concept Distillation to create richer concepts using a pre-trained
knowledgeable model as the teacher. Our method can sensitize or desensitize a
model towards concepts. We show applications of concept-sensitive training to
debias several classification problems. We also use concepts to induce prior
knowledge into IID, a reconstruction problem. Concept-sensitive training can
improve model interpretability, reduce biases, and induce prior knowledge.
Please visit https://avani17101.github.io/Concept-Distilllation/ for code and
more details.Comment: Neurips 202
Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems
Self-organization of complex morphological patterns from local interactions
is a fascinating phenomenon in many natural and artificial systems. In the
artificial world, typical examples of such morphogenetic systems are cellular
automata. Yet, their mechanisms are often very hard to grasp and so far
scientific discoveries of novel patterns have primarily been relying on manual
tuning and ad hoc exploratory search. The problem of automated diversity-driven
discovery in these systems was recently introduced [26, 62], highlighting that
two key ingredients are autonomous exploration and unsupervised representation
learning to describe "relevant" degrees of variations in the patterns. In this
paper, we motivate the need for what we call Meta-diversity search, arguing
that there is not a unique ground truth interesting diversity as it strongly
depends on the final observer and its motives. Using a continuous game-of-life
system for experiments, we provide empirical evidences that relying on
monolithic architectures for the behavioral embedding design tends to bias the
final discoveries (both for hand-defined and unsupervisedly-learned features)
which are unlikely to be aligned with the interest of a final end-user. To
address these issues, we introduce a novel dynamic and modular architecture
that enables unsupervised learning of a hierarchy of diverse representations.
Combined with intrinsically motivated goal exploration algorithms, we show that
this system forms a discovery assistant that can efficiently adapt its
diversity search towards preferences of a user using only a very small amount
of user feedback
Explainability in Deep Reinforcement Learning
A large set of the explainable Artificial Intelligence (XAI) literature is
emerging on feature relevance techniques to explain a deep neural network (DNN)
output or explaining models that ingest image source data. However, assessing
how XAI techniques can help understand models beyond classification tasks, e.g.
for reinforcement learning (RL), has not been extensively studied. We review
recent works in the direction to attain Explainable Reinforcement Learning
(XRL), a relatively new subfield of Explainable Artificial Intelligence,
intended to be used in general public applications, with diverse audiences,
requiring ethical, responsible and trustable algorithms. In critical situations
where it is essential to justify and explain the agent's behaviour, better
explainability and interpretability of RL models could help gain scientific
insight on the inner workings of what is still considered a black box. We
evaluate mainly studies directly linking explainability to RL, and split these
into two categories according to the way the explanations are generated:
transparent algorithms and post-hoc explainaility. We also review the most
prominent XAI works from the lenses of how they could potentially enlighten the
further deployment of the latest advances in RL, in the demanding present and
future of everyday problems.Comment: Article accepted at Knowledge-Based System
- âŠ