405 research outputs found
Tensor Product Generation Networks for Deep NLP Modeling
We present a new approach to the design of deep networks for natural language
processing (NLP), based on the general technique of Tensor Product
Representations (TPRs) for encoding and processing symbol structures in
distributed neural networks. A network architecture --- the Tensor Product
Generation Network (TPGN) --- is proposed which is capable in principle of
carrying out TPR computation, but which uses unconstrained deep learning to
design its internal representations. Instantiated in a model for image-caption
generation, TPGN outperforms LSTM baselines when evaluated on the COCO dataset.
The TPR-capable structure enables interpretation of internal representations
and operations, which prove to contain considerable grammatical content. Our
caption-generation model can be interpreted as generating sequences of
grammatical categories and retrieving words by their categories from a plan
encoded as a distributed representation
Binding and Normalization of Binary Sparse Distributed Representations by Context-Dependent Thinning
Distributed representations were often criticized as inappropriate for encoding of data with a complex structure. However Plate's Holographic Reduced Representations and Kanerva's Binary Spatter Codes are recent schemes that allow on-the-fly encoding of nested compositional structures by real-valued or dense binary vectors of fixed dimensionality.
In this paper we consider procedures of the Context-Dependent Thinning which were developed for representation of complex hierarchical items in the architecture of Associative-Projective Neural Networks. These procedures provide binding of items represented by sparse binary codevectors (with low probability of 1s). Such an encoding is biologically plausible and allows a high storage capacity of distributed associative memory where the codevectors may be stored.
In contrast to known binding procedures, Context-Dependent Thinning preserves the same low density (or sparseness) of the bound codevector for varied number of component codevectors. Besides, a bound codevector is not only similar to another one with similar component codevectors (as in other schemes), but it is also similar to the component codevectors themselves. This allows the similarity of structures to be estimated just by the overlap of their codevectors, without retrieval of the component codevectors. This also allows an easy retrieval of the component codevectors.
Examples of algorithmic and neural-network implementations of the thinning procedures are considered. We also present representation examples for various types of nested structured data (propositions using role-filler and predicate-arguments representation schemes, trees, directed acyclic graphs) using sparse codevectors of fixed dimension. Such representations may provide a fruitful alternative to the symbolic representations of traditional AI, as well as to the localist and microfeature-based connectionist representations
Geometric representations for minimalist grammars
We reformulate minimalist grammars as partial functions on term algebras for
strings and trees. Using filler/role bindings and tensor product
representations, we construct homomorphisms for these data structures into
geometric vector spaces. We prove that the structure-building functions as well
as simple processors for minimalist languages can be realized by piecewise
linear operators in representation space. We also propose harmony, i.e. the
distance of an intermediate processing step from the final well-formed state in
representation space, as a measure of processing complexity. Finally, we
illustrate our findings by means of two particular arithmetic and fractal
representations.Comment: 43 pages, 4 figure
Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience
Jackendoff (2002) posed four challenges that linguistic combinatoriality and
rules of language present to theories of brain function. The essence of these
problems is the question of how to neurally instantiate the rapid construction
and transformation of the compositional structures that are typically taken to
be the domain of symbolic processing. He contended that typical connectionist
approaches fail to meet these challenges and that the dialogue between
linguistic theory and cognitive neuroscience will be relatively unproductive
until the importance of these problems is widely recognised and the challenges
answered by some technical innovation in connectionist modelling. This paper
claims that a little-known family of connectionist models (Vector Symbolic
Architectures) are able to meet Jackendoff's challenges.Comment: This is a slightly updated version of the paper presented at the
Joint International Conference on Cognitive Science, 13-17 July 2003,
University of New South Wales, Sydney, Australia. 6 page
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
With the advent of deep learning, progressively larger neural networks have
been designed to solve complex tasks. We take advantage of these capacity-rich
models to lower the cost of inference by exploiting computation in
superposition. To reduce the computational burden per input, we propose
Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling
many inputs at once. MIMONets augment various deep neural network architectures
with variable binding mechanisms to represent an arbitrary number of inputs in
a compositional data structure via fixed-width distributed representations.
Accordingly, MIMONets adapt nonlinear neural transformations to process the
data structure holistically, leading to a speedup nearly proportional to the
number of superposed input items in the data structure. After processing in
superposition, an unbinding mechanism recovers each transformed input of
interest. MIMONets also provide a dynamic trade-off between accuracy and
throughput by an instantaneous on-demand switching between a set of
accuracy-throughput operating points, yet within a single set of fixed
parameters. We apply the concept of MIMONets to both CNN and Transformer
architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical
evaluations show that MIMOConv achieves about 2-4 x speedup at an accuracy
delta within [+0.68, -3.18]% compared to WideResNet CNNs on CIFAR10 and
CIFAR100. Similarly, MIMOFormer can handle 2-4 inputs at once while maintaining
a high average accuracy within a [-1.07, -3.43]% delta on the long range arena
benchmark. Finally, we provide mathematical bounds on the interference between
superposition channels in MIMOFormer. Our code is available at
https://github.com/IBM/multiple-input-multiple-output-nets.Comment: accepted in NeurIPS 202
- …