613 research outputs found
On the Realization of Compositionality in Neural Networks
We present a detailed comparison of two types of sequence to sequence models
trained to conduct a compositional task. The models are architecturally
identical at inference time, but differ in the way that they are trained: our
baseline model is trained with a task-success signal only, while the other
model receives additional supervision on its attention mechanism (Attentive
Guidance), which has shown to be an effective method for encouraging more
compositional solutions (Hupkes et al.,2019). We first confirm that the models
with attentive guidance indeed infer more compositional solutions than the
baseline, by training them on the lookup table task presented by Li\v{s}ka et
al. (2019). We then do an in-depth analysis of the structural differences
between the two model types, focusing in particular on the organisation of the
parameter space and the hidden layer activations and find noticeable
differences in both these aspects. Guided networks focus more on the components
of the input rather than the sequence as a whole and develop small functional
groups of neurons with specific purposes that use their gates more selectively.
Results from parameter heat maps, component swapping and graph analysis also
indicate that guided networks exhibit a more modular structure with a small
number of specialized, strongly connected neurons.Comment: To appear at BlackboxNLP 2019, AC
Transcoding compositionally: using attention to find more generalizable solutions
While sequence-to-sequence models have shown remarkable generalization power
across several natural language tasks, their construct of solutions are argued
to be less compositional than human-like generalization. In this paper, we
present seq2attn, a new architecture that is specifically designed to exploit
attention to find compositional patterns in the input. In seq2attn, the two
standard components of an encoder-decoder model are connected via a transcoder,
that modulates the information flow between them. We show that seq2attn can
successfully generalize, without requiring any additional supervision, on two
tasks which are specifically constructed to challenge the compositional skills
of neural networks. The solutions found by the model are highly interpretable,
allowing easy analysis of both the types of solutions that are found and
potential causes for mistakes. We exploit this opportunity to introduce a new
paradigm to test compositionality that studies the extent to which a model
overgeneralizes when confronted with exceptions. We show that seq2attn exhibits
such overgeneralization to a larger degree than a standard sequence-to-sequence
model.Comment: to appear at BlackboxNLP 2019, AC
Layer-wise Representation Fusion for Compositional Generalization
Despite successes across a broad range of applications, sequence-to-sequence
models' construct of solutions are argued to be less compositional than
human-like generalization. There is mounting evidence that one of the reasons
hindering compositional generalization is representations of the encoder and
decoder uppermost layer are entangled. In other words, the syntactic and
semantic representations of sequences are twisted inappropriately. However,
most previous studies mainly concentrate on enhancing token-level semantic
information to alleviate the representations entanglement problem, rather than
composing and using the syntactic and semantic representations of sequences
appropriately as humans do. In addition, we explain why the entanglement
problem exists from the perspective of recent studies about training deeper
Transformer, mainly owing to the ``shallow'' residual connections and its
simple, one-step operations, which fails to fuse previous layers' information
effectively. Starting from this finding and inspired by humans' strategies, we
propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and
Semant\textbf{i}c Representati\textbf{on}s), an extension to
sequence-to-sequence models to learn to fuse previous layers' information back
into the encoding and decoding process appropriately through introducing a
\emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion}
achieves competitive and even \textbf{state-of-the-art} results on two
realistic benchmarks, which empirically demonstrates the effectiveness of our
proposal.Comment: work in progress. arXiv admin note: substantial text overlap with
arXiv:2305.1216
Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio
Despite rapid advancement in the field of Constrained Natural Language
Generation, little time has been spent on exploring the potential of language
models which have had their vocabularies lexically, semantically, and/or
phonetically constrained. We find that most language models generate compelling
text even under significant constraints. We present a simple and universally
applicable technique for modifying the output of a language model by
compositionally applying filter functions to the language models vocabulary
before a unit of text is generated. This approach is plug-and-play and requires
no modification to the model. To showcase the value of this technique, we
present an easy to use AI writing assistant called Constrained Text Generation
Studio (CTGS). CTGS allows users to generate or choose from text with any
combination of a wide variety of constraints, such as banning a particular
letter, forcing the generated words to have a certain number of syllables,
and/or forcing the words to be partial anagrams of another word. We introduce a
novel dataset of prose that omits the letter e. We show that our method results
in strictly superior performance compared to fine-tuning alone on this dataset.
We also present a Huggingface space web-app presenting this technique called
Gadsby. The code is available to the public here:
https://github.com/Hellisotherpeople/Constrained-Text-Generation-StudioComment: Published in the proceedings of the 2nd Workshop on When Creative AI
Meets Conversational AI (CAI2), COLING 2022, 6 pages, System Demonstration
Pape
Building Machines That Learn and Think Like People
Recent progress in artificial intelligence (AI) has renewed interest in
building systems that learn and think like people. Many advances have come from
using deep neural networks trained end-to-end in tasks such as object
recognition, video games, and board games, achieving performance that equals or
even beats humans in some respects. Despite their biological inspiration and
performance achievements, these systems differ from human intelligence in
crucial ways. We review progress in cognitive science suggesting that truly
human-like learning and thinking machines will have to reach beyond current
engineering trends in both what they learn, and how they learn it.
Specifically, we argue that these machines should (a) build causal models of
the world that support explanation and understanding, rather than merely
solving pattern recognition problems; (b) ground learning in intuitive theories
of physics and psychology, to support and enrich the knowledge that is learned;
and (c) harness compositionality and learning-to-learn to rapidly acquire and
generalize knowledge to new tasks and situations. We suggest concrete
challenges and promising routes towards these goals that can combine the
strengths of recent neural network advances with more structured cognitive
models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary
proposals (until Nov. 22, 2016).
https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar
- …