94 research outputs found
Scalable Recollections for Continual Lifelong Learning
Given the recent success of Deep Learning applied to a variety of single
tasks, it is natural to consider more human-realistic settings. Perhaps the
most difficult of these settings is that of continual lifelong learning, where
the model must learn online over a continuous stream of non-stationary data. A
successful continual lifelong learning system must have three key capabilities:
it must learn and adapt over time, it must not forget what it has learned, and
it must be efficient in both training time and memory. Recent techniques have
focused their efforts primarily on the first two capabilities while questions
of efficiency remain largely unexplored. In this paper, we consider the problem
of efficient and effective storage of experiences over very large time-frames.
In particular we consider the case where typical experiences are O(n) bits and
memories are limited to O(k) bits for k << n. We present a novel scalable
architecture and training algorithm in this challenging domain and provide an
extensive evaluation of its performance. Our results show that we can achieve
considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201
Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation
We introduce the multiresolution recurrent neural network, which extends the
sequence-to-sequence framework to model natural language generation as two
parallel discrete stochastic processes: a sequence of high-level coarse tokens,
and a sequence of natural language tokens. There are many ways to estimate or
learn the high-level coarse tokens, but we argue that a simple extraction
procedure is sufficient to capture a wealth of high-level discourse semantics.
Such procedure allows training the multiresolution recurrent neural network by
maximizing the exact joint log-likelihood over both sequences. In contrast to
the standard log- likelihood objective w.r.t. natural language tokens (word
perplexity), optimizing the joint log-likelihood biases the model towards
modeling high-level abstractions. We apply the proposed model to the task of
dialogue response generation in two challenging domains: the Ubuntu technical
support domain, and Twitter conversations. On Ubuntu, the model outperforms
competing approaches by a substantial margin, achieving state-of-the-art
results according to both automatic evaluation metrics and a human evaluation
study. On Twitter, the model appears to generate more relevant and on-topic
responses according to automatic evaluation metrics. Finally, our experiments
demonstrate that the proposed model is more adept at overcoming the sparsity of
natural language and is better able to capture long-term structure.Comment: 21 pages, 2 figures, 10 table
Compositional Program Generation for Systematic Generalization
Compositional generalization is a key ability of humans that enables us to
learn new concepts from only a handful examples. Machine learning models,
including the now ubiquitous transformers, struggle to generalize in this way,
and typically require thousands of examples of a concept during training in
order to generalize meaningfully. This difference in ability between humans and
artificial neural architectures, motivates this study on a neuro-symbolic
architecture called the Compositional Program Generator (CPG). CPG has three
key features: modularity, type abstraction, and recursive composition, that
enable it to generalize both systematically to new concepts in a few-shot
manner, as well as productively by length on various sequence-to-sequence
language tasks. For each input, CPG uses a grammar of the input domain and a
parser to generate a type hierarchy in which each grammar rule is assigned its
own unique semantic module, a probabilistic copy or substitution program.
Instances with the same hierarchy are processed with the same composed program,
while those with different hierarchies may be processed with different
programs. CPG learns parameters for the semantic modules and is able to learn
the semantics for new types incrementally. Given a context-free grammar of the
input language and a dictionary mapping each word in the source language to its
interpretation in the output language, CPG can achieve perfect generalization
on the SCAN and COGS benchmarks, in both standard and extreme few-shot
settings.Comment: 7 pages of text with 1 page of reference
- …