769 research outputs found
Question-Answering with Grammatically-Interpretable Representations
We introduce an architecture, the Tensor Product Recurrent Network (TPRN). In
our application of TPRN, internal representations learned by end-to-end
optimization in a deep neural network performing a textual question-answering
(QA) task can be interpreted using basic concepts from linguistic theory. No
performance penalty need be paid for this increased interpretability: the
proposed model performs comparably to a state-of-the-art system on the SQuAD QA
task. The internal representation which is interpreted is a Tensor Product
Representation: for each input word, the model selects a symbol to encode the
word, and a role in which to place the symbol, and binds the two together. The
selection is via soft attention. The overall interpretation is built from
interpretations of the symbols, as recruited by the trained model, and
interpretations of the roles as used by the model. We find support for our
initial hypothesis that symbols can be interpreted as lexical-semantic word
meanings, while roles can be interpreted as approximations of grammatical roles
(or categories) such as subject, wh-word, determiner, etc. Fine-grained
analysis reveals specific correspondences between the learned roles and parts
of speech as assigned by a standard tagger (Toutanova et al. 2003), and finds
several discrepancies in the model's favor. In this sense, the model learns
significant aspects of grammar, after having been exposed solely to
linguistically unannotated text, questions, and answers: no prior linguistic
knowledge is given to the model. What is given is the means to build
representations using symbols and roles, with an inductive bias favoring use of
these in an approximately discrete manner
Learning Semantic Representations for the Phrase Translation Model
This paper presents a novel semantic-based phrase translation model. A pair
of source and target phrases are projected into continuous-valued vector
representations in a low-dimensional latent semantic space, where their
translation score is computed by the distance between the pair in this new
space. The projection is performed by a multi-layer neural network whose
weights are learned on parallel training data. The learning is aimed to
directly optimize the quality of end-to-end machine translation results.
Experimental evaluation has been performed on two Europarl translation tasks,
English-French and German-English. The results show that the new semantic-based
phrase translation model significantly improves the performance of a
state-of-the-art phrase-based statistical machine translation sys-tem, leading
to a gain of 0.7-1.0 BLEU points
Attentive Tensor Product Learning
This paper proposes a new architecture - Attentive Tensor Product Learning
(ATPL) - to represent grammatical structures in deep learning models. ATPL is a
new architecture to bridge this gap by exploiting Tensor Product
Representations (TPR), a structured neural-symbolic model developed in
cognitive science, aiming to integrate deep learning with explicit language
structures and rules. The key ideas of ATPL are: 1) unsupervised learning of
role-unbinding vectors of words via TPR-based deep neural network; 2) employing
attention modules to compute TPR; and 3) integration of TPR with typical deep
learning architectures including Long Short-Term Memory (LSTM) and Feedforward
Neural Network (FFNN). The novelty of our approach lies in its ability to
extract the grammatical structure of a sentence by using role-unbinding
vectors, which are obtained in an unsupervised manner. This ATPL approach is
applied to 1) image captioning, 2) part of speech (POS) tagging, and 3)
constituency parsing of a sentence. Experimental results demonstrate the
effectiveness of the proposed approach
Tensor Product Generation Networks for Deep NLP Modeling
We present a new approach to the design of deep networks for natural language
processing (NLP), based on the general technique of Tensor Product
Representations (TPRs) for encoding and processing symbol structures in
distributed neural networks. A network architecture --- the Tensor Product
Generation Network (TPGN) --- is proposed which is capable in principle of
carrying out TPR computation, but which uses unconstrained deep learning to
design its internal representations. Instantiated in a model for image-caption
generation, TPGN outperforms LSTM baselines when evaluated on the COCO dataset.
The TPR-capable structure enables interpretation of internal representations
and operations, which prove to contain considerable grammatical content. Our
caption-generation model can be interpreted as generating sequences of
grammatical categories and retrieving words by their categories from a plan
encoded as a distributed representation
- …