10 research outputs found
Causal normalizing flows: from theory to practice
In this work, we deepen on the use of normalizing flows for causal reasoning.
Specifically, we first leverage recent results on non-linear ICA to show that
causal models are identifiable from observational data given a causal ordering,
and thus can be recovered using autoregressive normalizing flows (NFs). Second,
we analyze different design and learning choices for causal normalizing flows
to capture the underlying causal data-generating process. Third, we describe
how to implement the do-operator in causal NFs, and thus, how to answer
interventional and counterfactual questions. Finally, in our experiments, we
validate our design and training choices through a comprehensive ablation
study; compare causal NFs to other approaches for approximating causal models;
and empirically demonstrate that causal NFs can be used to address real-world
problems, where the presence of mixed discrete-continuous data and partial
knowledge on the causal graph is the norm. The code for this work can be found
at https://github.com/psanch21/causal-flows.Comment: 32 pages, 15 figures. Accepted as an Oral presentation at NeurIPS
202
Learnable Graph Convolutional Attention Networks
Existing Graph Neural Networks (GNNs) compute the message exchange between
nodes by either aggregating uniformly (convolving) the features of all the
neighboring nodes, or by applying a non-uniform score (attending) to the
features. Recent works have shown the strengths and weaknesses of the resulting
GNN architectures, respectively, GCNs and GATs. In this work, we aim at
exploiting the strengths of both approaches to their full extent. To this end,
we first introduce the graph convolutional attention layer (CAT), which relies
on convolutions to compute the attention scores. Unfortunately, as in the case
of GCNs and GATs, we show that there exists no clear winner between the three
(neither theoretically nor in practice) as their performance directly depends
on the nature of the data (i.e., of the graph and features). This result brings
us to the main contribution of our work, the learnable graph convolutional
attention network (L-CAT): a GNN architecture that automatically interpolates
between GCN, GAT and CAT in each layer, by adding only two scalar parameters.
Our results demonstrate that L-CAT is able to efficiently combine different GNN
layers along the network, outperforming competing methods in a wide range of
datasets, and resulting in a more robust model that reduces the need of
cross-validating.Comment: On review. 31 pages, 6 figure
Preliminary Results on Different Text Processing Tasks Using Encoder-Decoder Networks and the Causal Feature Extractor
Deep learning methods are gaining popularity in different application domains, and especially in natural language processing. It is commonly believed that using a large enough dataset and an adequate network architecture, almost any processing problem can be solved. A frequent and widely used typology is the encoder-decoder architecture, where the input data is transformed into an intermediate code by means of an encoder, and then a decoder takes this code to produce its output. Different types of networks can be used in the encoder and the decoder, depending on the problem of interest, such as convolutional neural networks (CNN) or long-short term memories (LSTM). This paper uses for the encoder a method recently proposed, called Causal Feature Extractor (CFE). It is based on causal convolutions (i.e., convolutions that depend only on one direction of the input), dilatation (i.e., increasing the aperture size of the convolutions) and bidirectionality (i.e., independent networks in both directions). Some preliminary results are presented on three different tasks and compared with state-of-the-art methods: bilingual translation, LaTeX decompilation and audio transcription. The proposed method achieves promising results, showing its ubiquity to work with text, audio and images. Moreover, it has a shorter training time, requiring less time per iteration, and a good use of the attention mechanisms based on attention matrices
Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization
A number of variational autoencoders (VAEs) have recently emerged with the
aim of modeling multimodal data, e.g., to jointly model images and their
corresponding captions. Still, multimodal VAEs tend to focus solely on a subset
of the modalities, e.g., by fitting the image while neglecting the caption. We
refer to this limitation as modality collapse. In this work, we argue that this
effect is a consequence of conflicting gradients during multimodal VAE
training. We show how to detect the sub-graphs in the computational graphs
where gradients conflict (impartiality blocks), as well as how to leverage
existing gradient-conflict solutions from multitask learning to mitigate
modality collapse. That is, to ensure impartial optimization across modalities.
We apply our training framework to several multimodal VAE models, losses and
datasets from the literature, and empirically show that our framework
significantly improves the reconstruction performance, conditional generation,
and coherence of the latent space across modalities.Comment: Accepted as a Spotlight paper at ICML 2022. 27 pages, 10 figure