1,185 research outputs found
From explanation to synthesis: Compositional program induction for learning from demonstration
Hybrid systems are a compact and natural mechanism with which to address
problems in robotics. This work introduces an approach to learning hybrid
systems from demonstrations, with an emphasis on extracting models that are
explicitly verifiable and easily interpreted by robot operators. We fit a
sequence of controllers using sequential importance sampling under a generative
switching proportional controller task model. Here, we parameterise controllers
using a proportional gain and a visually verifiable joint angle goal. Inference
under this model is challenging, but we address this by introducing an
attribution prior extracted from a neural end-to-end visuomotor control model.
Given the sequence of controllers comprising a task, we simplify the trace
using grammar parsing strategies, taking advantage of the sequence
compositionality, before grounding the controllers by training perception
networks to predict goals given images. Using this approach, we are
successfully able to induce a program for a visuomotor reaching task involving
loops and conditionals from a single demonstration and a neural end-to-end
model. In addition, we are able to discover the program used for a tower
building task. We argue that computer program-like control systems are more
interpretable than alternative end-to-end learning approaches, and that hybrid
systems inherently allow for better generalisation across task configurations
PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding
Large language models (LM) based on Transformers allow to generate plausible
long texts. In this paper, we explore how this generation can be further
controlled at decoding time to satisfy certain constraints (e.g. being
non-toxic, conveying certain emotions, using a specific writing style, etc.)
without fine-tuning the LM. Precisely, we formalize constrained generation as a
tree exploration process guided by a discriminator that indicates how well the
associated sequence respects the constraint. This approach, in addition to
being easier and cheaper to train than fine-tuning the LM, allows to apply the
constraint more finely and dynamically. We propose several original methods to
search this generation tree, notably the Monte Carlo Tree Search (MCTS) which
provides theoretical guarantees on the search efficiency, but also simpler
methods based on re-ranking a pool of diverse sequences using the discriminator
scores. These methods are evaluated, with automatic and human-based metrics, on
two types of constraints and languages: review polarity and emotion control in
French and English. We show that discriminator-guided MCTS decoding achieves
state-of-the-art results without having to tune the language model, in both
tasks and languages. We also demonstrate that other proposed decoding methods
based on re-ranking can be really effective when diversity among the generated
propositions is encouraged.Comment: 15 pages, 5 tables, 7 figures, accepted to NAACL 202
Recommended from our members
Inductive Bias and Modular Design for Sample-Efficient Neural Language Learning
Most of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions.
Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models.
Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations.
The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation.ERC (Consolidator Grant 648909) Lexical
Google Research Faculty Award 201
Grounding Aleatoric Uncertainty in Unsupervised Environment Design
Adaptive curricula in reinforcement learning (RL) have proven effective for
producing policies robust to discrepancies between the train and test
environment. Recently, the Unsupervised Environment Design (UED) framework
generalized RL curricula to generating sequences of entire environments,
leading to new methods with robust minimax regret properties. Problematically,
in partially-observable or stochastic settings, optimal policies may depend on
the ground-truth distribution over aleatoric parameters of the environment in
the intended deployment setting, while curriculum learning necessarily shifts
the training distribution. We formalize this phenomenon as curriculum-induced
covariate shift (CICS), and describe how its occurrence in aleatoric parameters
can lead to suboptimal policies. Directly sampling these parameters from the
ground-truth distribution avoids the issue, but thwarts curriculum learning. We
propose SAMPLR, a minimax regret UED method that optimizes the ground-truth
utility function, even when the underlying training data is biased due to CICS.
We prove, and validate on challenging domains, that our approach preserves
optimality under the ground-truth distribution, while promoting robustness
across the full range of environment settings
- âŠ