357 research outputs found
Do-Operation Guided Causal Representation Learning with Reduced Supervision Strength
Causal representation learning has been proposed to encode relationships
between factors presented in the high dimensional data. However, existing
methods suffer from merely using a large amount of labeled data and ignore the
fact that samples generated by the same causal mechanism follow the same causal
relationships. In this paper, we seek to explore such information by leveraging
do-operation to reduce supervision strength. We propose a framework that
implements do-operation by swapping latent cause and effect factors encoded
from a pair of inputs. Moreover, we also identify the inadequacy of existing
causal representation metrics empirically and theoretically and introduce new
metrics for better evaluation. Experiments conducted on both synthetic and real
datasets demonstrate the superiorities of our method compared with
state-of-the-art methods.Comment: NeurIPS 2022 Workshop CML4Impact Workshop Camera Read
F?D: On understanding the role of deep feature spaces on face generation evaluation
Perceptual metrics, like the Fr\'echet Inception Distance (FID), are widely
used to assess the similarity between synthetically generated and ground truth
(real) images. The key idea behind these metrics is to compute errors in a deep
feature space that captures perceptually and semantically rich image features.
Despite their popularity, the effect that different deep features and their
design choices have on a perceptual metric has not been well studied. In this
work, we perform a causal analysis linking differences in semantic attributes
and distortions between face image distributions to Fr\'echet distances (FD)
using several popular deep feature spaces. A key component of our analysis is
the creation of synthetic counterfactual faces using deep face generators. Our
experiments show that the FD is heavily influenced by its feature space's
training dataset and objective function. For example, FD using features
extracted from ImageNet-trained models heavily emphasize hats over regions like
the eyes and mouth. Moreover, FD using features from a face gender classifier
emphasize hair length more than distances in an identity (recognition) feature
space. Finally, we evaluate several popular face generation models across
feature spaces and find that StyleGAN2 consistently ranks higher than other
face generators, except with respect to identity (recognition) features. This
suggests the need for considering multiple feature spaces when evaluating
generative models and using feature spaces that are tuned to nuances of the
domain of interest.Comment: Code and dataset to be released soo
Unifying (Machine) Vision via Counterfactual World Modeling
Leading approaches in machine vision employ different architectures for
different tasks, trained on costly task-specific labeled datasets. This
complexity has held back progress in areas, such as robotics, where robust
task-general perception remains a bottleneck. In contrast, "foundation models"
of natural language have shown how large pre-trained neural networks can
provide zero-shot solutions to a broad spectrum of apparently distinct tasks.
Here we introduce Counterfactual World Modeling (CWM), a framework for
constructing a visual foundation model: a unified, unsupervised network that
can be prompted to perform a wide variety of visual computations. CWM has two
key components, which resolve the core issues that have hindered application of
the foundation model concept to vision. The first is structured masking, a
generalization of masked prediction methods that encourages a prediction model
to capture the low-dimensional structure in visual data. The model thereby
factors the key physical components of a scene and exposes an interface to them
via small sets of visual tokens. This in turn enables CWM's second main idea --
counterfactual prompting -- the observation that many apparently distinct
visual representations can be computed, in a zero-shot manner, by comparing the
prediction model's output on real inputs versus slightly modified
("counterfactual") inputs. We show that CWM generates high-quality readouts on
real-world images and videos for a diversity of tasks, including estimation of
keypoints, optical flow, occlusions, object segments, and relative depth. Taken
together, our results show that CWM is a promising path to unifying the
manifold strands of machine vision in a conceptually simple foundation
SCADI: Self-supervised Causal Disentanglement in Latent Variable Models
Causal disentanglement has great potential for capturing complex situations.
However, there is a lack of practical and efficient approaches. It is already
known that most unsupervised disentangling methods are unable to produce
identifiable results without additional information, often leading to randomly
disentangled output. Therefore, most existing models for disentangling are
weakly supervised, providing information about intrinsic factors, which incurs
excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised
CAusal DIsentanglement), that enables the model to discover semantic factors
and learn their causal relationships without any supervision. This model
combines a masked structural causal model (SCM) with a pseudo-label generator
for causal disentanglement, aiming to provide a new direction for
self-supervised causal disentanglement models.Comment: 12 pages, 12 figure
Disentangled Representation Learning
Disentangled Representation Learning (DRL) aims to learn a model capable of
identifying and disentangling the underlying factors hidden in the observable
data in representation form. The process of separating underlying factors of
variation into variables with semantic meaning benefits in learning explainable
representations of data, which imitates the meaningful understanding process of
humans when observing an object or relation. As a general learning strategy,
DRL has demonstrated its power in improving the model explainability,
controlability, robustness, as well as generalization capacity in a wide range
of scenarios such as computer vision, natural language processing, data mining
etc. In this article, we comprehensively review DRL from various aspects
including motivations, definitions, methodologies, evaluations, applications
and model designs. We discuss works on DRL based on two well-recognized
definitions, i.e., Intuitive Definition and Group Theory Definition. We further
categorize the methodologies for DRL into four groups, i.e., Traditional
Statistical Approaches, Variational Auto-encoder Based Approaches, Generative
Adversarial Networks Based Approaches, Hierarchical Approaches and Other
Approaches. We also analyze principles to design different DRL models that may
benefit different tasks in practical applications. Finally, we point out
challenges in DRL as well as potential research directions deserving future
investigations. We believe this work may provide insights for promoting the DRL
research in the community.Comment: 22 pages,9 figure
Decomposing Counterfactual Explanations for Consequential Decision Making
The goal of algorithmic recourse is to reverse unfavorable decisions (e.g.,
from loan denial to approval) under automated decision making by suggesting
actionable feature changes (e.g., reduce the number of credit cards). To
generate low-cost recourse the majority of methods work under the assumption
that the features are independently manipulable (IMF). To address the feature
dependency issue the recourse problem is usually studied through the causal
recourse paradigm. However, it is well known that strong assumptions, as
encoded in causal models and structural equations, hinder the applicability of
these methods in complex domains where causal dependency structures are
ambiguous. In this work, we develop \texttt{DEAR} (DisEntangling Algorithmic
Recourse), a novel and practical recourse framework that bridges the gap
between the IMF and the strong causal assumptions. \texttt{DEAR} generates
recourses by disentangling the latent representation of co-varying features
from a subset of promising recourse features to capture the main practical
recourse desiderata. Our experiments on real-world data corroborate our
theoretically motivated recourse model and highlight our framework's ability to
provide reliable, low-cost recourse in the presence of feature dependencies
- …