76 research outputs found
Disentanglement of Latent Representations via Sparse Causal Interventions
The process of generating data such as images is controlled by independent
and unknown factors of variation. The retrieval of these variables has been
studied extensively in the disentanglement, causal representation learning, and
independent component analysis fields. Recently, approaches merging these
domains together have shown great success. Instead of directly representing the
factors of variation, the problem of disentanglement can be seen as finding the
interventions on one image that yield a change to a single factor. Following
this assumption, we introduce a new method for disentanglement inspired by
causal dynamics that combines causality theory with vector-quantized
variational autoencoders. Our model considers the quantized vectors as causal
variables and links them in a causal graph. It performs causal interventions on
the graph and generates atomic transitions affecting a unique factor of
variation in the image. We also introduce a new task of action retrieval that
consists of finding the action responsible for the transition between two
images. We test our method on standard synthetic and real-world disentanglement
datasets. We show that it can effectively disentangle the factors of variation
and perform precise interventions on high-level semantic attributes of an image
without affecting its quality, even with imbalanced data distributions.Comment: 16 pages, 10 pages for the main paper and 6 pages for the supplement,
14 figures, submitted to IJCAI 2023. V2: added link to repositor
A Sequential Set Generation Method for Predicting Set-Valued Outputs
Consider a general machine learning setting where the output is a set of
labels or sequences. This output set is unordered and its size varies with the
input. Whereas multi-label classification methods seem a natural first resort,
they are not readily applicable to set-valued outputs because of the growth
rate of the output space; and because conventional sequence generation doesn't
reflect sets' order-free nature. In this paper, we propose a unified
framework--sequential set generation (SSG)--that can handle output sets of
labels and sequences. SSG is a meta-algorithm that leverages any probabilistic
learning method for label or sequence prediction, but employs a proper
regularization such that a new label or sequence is generated repeatedly until
the full set is produced. Though SSG is sequential in nature, it does not
penalize the ordering of the appearance of the set elements and can be applied
to a variety of set output problems, such as a set of classification labels or
sequences. We perform experiments with both benchmark and synthetic data sets
and demonstrate SSG's strong performance over baseline methods.Comment: Published at AAAI 201
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
We equip a smaller Language Model to generalise to answering challenging
compositional questions that have not been seen in training. To do so we
propose a combination of multitask supervised pretraining on up to 93 tasks
designed to instill diverse reasoning abilities, and a dense retrieval system
that aims to retrieve a set of evidential paragraph fragments. Recent progress
in question-answering has been achieved either through prompting methods
against very large pretrained Language Models in zero or few-shot fashion, or
by fine-tuning smaller models, sometimes in conjunction with information
retrieval. We focus on the less explored question of the extent to which
zero-shot generalisation can be enabled in smaller models with retrieval
against a corpus within which sufficient information to answer a particular
question may not exist. We establish strong baselines in this setting for
diverse evaluation datasets (StrategyQA, CommonsenseQA, IIRC, DROP, Musique and
ARC-DA), and show that performance can be significantly improved by adding
retrieval-augmented training datasets which are designed to expose our models
to a variety of heuristic reasoning strategies such as weighing partial
evidence or ignoring an irrelevant context
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval
When provided with sufficient explanatory context, smaller Language Models
have been shown to exhibit strong reasoning ability on challenging short-answer
question-answering tasks where the questions are unseen in training. We
evaluate two methods for further improvement in this setting. Both methods
focus on combining rationales generated by a larger Language Model with longer
contexts created from a multi-hop dense retrieval system. The first method
() involves training a Rationale Ranking model to score both
generated rationales and retrieved contexts with respect to relevance and
truthfulness. We then use the scores to derive combined contexts from both
knowledge sources using a number of combinatory strategies. For the second
method () we train a smaller Reasoning model using
retrieval-augmented training datasets such that it becomes proficient at
utilising relevant information from longer text sequences that may be only
partially evidential and frequently contain many irrelevant sentences.
Generally we find that both methods are effective but that the
method is more straightforward to apply and produces the strongest results in
the unseen setting on which we focus. Our single best Reasoning model using
only 440 million parameters materially improves upon strong comparable prior
baselines for unseen evaluation datasets (StrategyQA 58.9 61.7
acc., CommonsenseQA 63.6 72.7 acc., ARC-DA 31.6
52.1 F1, IIRC 25.5 27.3 F1) and a version utilising our prior
knowledge of each type of question in selecting a context combination strategy
does even better. Our proposed models also generally outperform direct prompts
against much larger models (BLOOM 175B and StableVicuna 13B) in both few-shot
chain-of-thought and few-shot answer-only settings
- …