12,779 research outputs found
Memory-Augmented Neural Networks for Machine Translation
Memory-augmented neural networks (MANNs) have been shown to outperform other
recurrent neural network architectures on a series of artificial sequence
learning tasks, yet they have had limited application to real-world tasks. We
evaluate direct application of Neural Turing Machines (NTM) and Differentiable
Neural Computers (DNC) to machine translation. We further propose and evaluate
two models which extend the attentional encoder-decoder with capabilities
inspired by memory augmented neural networks. We evaluate our proposed models
on IWSLT Vietnamese to English and ACL Romanian to English datasets. Our
proposed models and the memory augmented neural networks perform similarly to
the attentional encoder-decoder on the Vietnamese to English translation task
while have a 0.3-1.9 lower BLEU score for the Romanian to English task.
Interestingly, our analysis shows that despite being equipped with additional
flexibility and being randomly initialized memory augmented neural networks
learn an algorithm for machine translation almost identical to the attentional
encoder-decoder
Learning to Remember Rare Events
Despite recent advances, memory-augmented deep neural networks are still
limited when it comes to life-long and one-shot learning, especially in
remembering rare events. We present a large-scale life-long memory module for
use in deep learning. The module exploits fast nearest-neighbor algorithms for
efficiency and thus scales to large memory sizes. Except for the
nearest-neighbor query, the module is fully differentiable and trained
end-to-end with no extra supervision. It operates in a life-long manner, i.e.,
without the need to reset it during training.
Our memory module can be easily added to any part of a supervised neural
network. To show its versatility we add it to a number of networks, from simple
convolutional ones tested on image classification to deep sequence-to-sequence
and recurrent-convolutional models. In all cases, the enhanced network gains
the ability to remember and do life-long one-shot learning. Our module
remembers training examples shown many thousands of steps in the past and it
can successfully generalize from them. We set new state-of-the-art for one-shot
learning on the Omniglot dataset and demonstrate, for the first time, life-long
one-shot learning in recurrent neural networks on a large-scale machine
translation task.Comment: Conference paper accepted for ICLR'1
Learning to Remember Translation History with a Continuous Cache
Existing neural machine translation (NMT) models generally translate
sentences in isolation, missing the opportunity to take advantage of
document-level information. In this work, we propose to augment NMT models with
a very light-weight cache-like memory network, which stores recent hidden
representations as translation history. The probability distribution over
generated words is updated online depending on the translation history
retrieved from the memory, endowing NMT models with the capability to
dynamically adapt over time. Experiments on multiple domains with different
topics and styles show the effectiveness of the proposed approach with
negligible impact on the computational cost.Comment: Accepted by TACL 201
Attention Augmented Convolutional Networks
Convolutional networks have been the paradigm of choice in many computer
vision applications. The convolution operation however has a significant
weakness in that it only operates on a local neighborhood, thus missing global
information. Self-attention, on the other hand, has emerged as a recent advance
to capture long range interactions, but has mostly been applied to sequence
modeling and generative modeling tasks. In this paper, we consider the use of
self-attention for discriminative visual tasks as an alternative to
convolutions. We introduce a novel two-dimensional relative self-attention
mechanism that proves competitive in replacing convolutions as a stand-alone
computational primitive for image classification. We find in control
experiments that the best results are obtained when combining both convolutions
and self-attention. We therefore propose to augment convolutional operators
with this self-attention mechanism by concatenating convolutional feature maps
with a set of feature maps produced via self-attention. Extensive experiments
show that Attention Augmentation leads to consistent improvements in image
classification on ImageNet and object detection on COCO across many different
models and scales, including ResNets and a state-of-the art mobile constrained
network, while keeping the number of parameters similar. In particular, our
method achieves a top-1 accuracy improvement on ImageNet classification
over a ResNet50 baseline and outperforms other attention mechanisms for images
such as Squeeze-and-Excitation. It also achieves an improvement of 1.4 mAP in
COCO Object Detection on top of a RetinaNet baseline.Comment: ICCV 201
Integrating Transformer and Paraphrase Rules for Sentence Simplification
Sentence simplification aims to reduce the complexity of a sentence while
retaining its original meaning. Current models for sentence simplification
adopted ideas from ma- chine translation studies and implicitly learned
simplification mapping rules from normal- simple sentence pairs. In this paper,
we explore a novel model based on a multi-layer and multi-head attention
architecture and we pro- pose two innovative approaches to integrate the Simple
PPDB (A Paraphrase Database for Simplification), an external paraphrase
knowledge base for simplification that covers a wide range of real-world
simplification rules. The experiments show that the integration provides two
major benefits: (1) the integrated model outperforms multiple state- of-the-art
baseline models for sentence simplification in the literature (2) through
analysis of the rule utilization, the model seeks to select more accurate
simplification rules. The code and models used in the paper are available at
https://github.com/ Sanqiang/text_simplification
Implementing Neural Turing Machines
Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural
Networks, a new class of recurrent neural networks which decouple computation
from memory by introducing an external memory unit. NTMs have demonstrated
superior performance over Long Short-Term Memory Cells in several sequence
learning tasks. A number of open source implementations of NTMs exist but are
unstable during training and/or fail to replicate the reported performance of
NTMs. This paper presents the details of our successful implementation of a
NTM. Our implementation learns to solve three sequential learning tasks from
the original NTM paper. We find that the choice of memory contents
initialization scheme is crucial in successfully implementing a NTM. Networks
with memory contents initialized to small constant values converge on average 2
times faster than the next best memory contents initialization scheme
Flexible and Creative Chinese Poetry Generation Using Neural Memory
It has been shown that Chinese poems can be successfully generated by
sequence-to-sequence neural models, particularly with the attention mechanism.
A potential problem of this approach, however, is that neural models can only
learn abstract rules, while poem generation is a highly creative process that
involves not only rules but also innovations for which pure statistical models
are not appropriate in principle. This work proposes a memory-augmented neural
model for Chinese poem generation, where the neural model and the augmented
memory work together to balance the requirements of linguistic accordance and
aesthetic innovation, leading to innovative generations that are still
rule-compliant. In addition, it is found that the memory mechanism provides
interesting flexibility that can be used to generate poems with different
styles
Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms
Idioms pose problems to almost all Machine Translation systems. This type of
language is very frequent in day-to-day language use and cannot be simply
ignored. The recent interest in memory augmented models in the field of
Language Modelling has aided the systems to achieve good results by bridging
long-distance dependencies. In this paper we explore the use of such techniques
into a Neural Machine Translation system to help in translation of idiomatic
language
Linguistic Knowledge as Memory for Recurrent Neural Networks
Training recurrent neural networks to model long term dependencies is
difficult. Hence, we propose to use external linguistic knowledge as an
explicit signal to inform the model which memories it should utilize.
Specifically, external knowledge is used to augment a sequence with typed edges
between arbitrarily distant elements, and the resulting graph is decomposed
into directed acyclic subgraphs. We introduce a model that encodes such graphs
as explicit memory in recurrent neural networks, and use it to model
coreference relations in text. We apply our model to several text comprehension
tasks and achieve new state-of-the-art results on all considered benchmarks,
including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out
of the 20 tasks with only 1000 training examples per task. Analysis of the
learned representations further demonstrates the ability of our model to encode
fine-grained entity information across a document
Advances in Natural Language Question Answering: A Review
Question Answering has recently received high attention from artificial
intelligence communities due to the advancements in learning technologies.
Early question answering models used rule-based approaches and moved to the
statistical approach to address the vastly available information. However,
statistical approaches are shown to underperform in handling the dynamic nature
and the variation of language. Therefore, learning models have shown the
capability of handling the dynamic nature and variations in language. Many deep
learning methods have been introduced to question answering. Most of the deep
learning approaches have shown to achieve higher results compared to machine
learning and statistical methods. The dynamic nature of language has profited
from the nonlinear learning in deep learning. This has created prominent
success and a spike in work on question answering. This paper discusses the
successes and challenges in question answering question answering systems and
techniques that are used in these challenges.Comment: arXiv admin note: text overlap with arXiv:1609.04667 by other author
- …