108,202 research outputs found
Recurrent and Contextual Models for Visual Question Answering
We propose a series of recurrent and contextual neural network models for
multiple choice visual question answering on the Visual7W dataset. Motivated by
divergent trends in model complexities in the literature, we explore the
balance between model expressiveness and simplicity by studying incrementally
more complex architectures. We start with LSTM-encoding of input questions and
answers; build on this with context generation by LSTM-encodings of neural
image and question representations and attention over images; and evaluate the
diversity and predictive power of our models and the ensemble thereof. All
models are evaluated against a simple baseline inspired by the current
state-of-the-art, consisting of involving simple concatenation of bag-of-words
and CNN representations for the text and images, respectively. Generally, we
observe marked variation in image-reasoning performance between our models not
obvious from their overall performance, as well as evidence of dataset bias.
Our standalone models achieve accuracies up to , while the ensemble of
all models achieves the best accuracy of , within of the
current state-of-the-art for Visual7W
Hierarchical Attention: What Really Counts in Various NLP Tasks
Attention mechanisms in sequence to sequence models have shown great ability
and wonderful performance in various natural language processing (NLP) tasks,
such as sentence embedding, text generation, machine translation, machine
reading comprehension, etc. Unfortunately, existing attention mechanisms only
learn either high-level or low-level features. In this paper, we think that the
lack of hierarchical mechanisms is a bottleneck in improving the performance of
the attention mechanisms, and propose a novel Hierarchical Attention Mechanism
(Ham) based on the weighted sum of different layers of a multi-level attention.
Ham achieves a state-of-the-art BLEU score of 0.26 on Chinese poem generation
task and a nearly 6.5% averaged improvement compared with the existing machine
reading comprehension models such as BIDAF and Match-LSTM. Furthermore, our
experiments and theorems reveal that Ham has greater generalization and
representation ability than existing attention mechanisms
Neural GPUs Learn Algorithms
Learning an algorithm from examples is a fundamental problem that has been
widely studied. Recently it has been addressed using neural networks, in
particular by Neural Turing Machines (NTMs). These are fully differentiable
computers that use backpropagation to learn their own programming. Despite
their appeal NTMs have a weakness that is caused by their sequential nature:
they are not parallel and are are hard to train due to their large depth when
unfolded.
We present a neural network architecture to address this problem: the Neural
GPU. It is based on a type of convolutional gated recurrent unit and, like the
NTM, is computationally universal. Unlike the NTM, the Neural GPU is highly
parallel which makes it easier to train and efficient to run.
An essential property of algorithms is their ability to handle inputs of
arbitrary size. We show that the Neural GPU can be trained on short instances
of an algorithmic task and successfully generalize to long instances. We
verified it on a number of tasks including long addition and long
multiplication of numbers represented in binary. We train the Neural GPU on
numbers with upto 20 bits and observe no errors whatsoever while testing it,
even on much longer numbers.
To achieve these results we introduce a technique for training deep recurrent
networks: parameter sharing relaxation. We also found a small amount of dropout
and gradient noise to have a large positive effect on learning and
generalization
OrderNet: Ordering by Example
In this paper we introduce a new neural architecture for sorting unordered
sequences where the correct sequence order is not easily defined but must
rather be inferred from training data. We refer to this architecture as
OrderNet and describe how it was constructed to be naturally permutation
equivariant while still allowing for rich interactions of elements of the input
set. We evaluate the capabilities of our architecture by training it to
approximate solutions for the Traveling Salesman Problem and find that it
outperforms previously studied supervised techniques in its ability to
generalize to longer sequences than it was trained with. We further demonstrate
the capability by reconstructing the order of sentences with scrambled word
order
Sequential Context Encoding for Duplicate Removal
Duplicate removal is a critical step to accomplish a reasonable amount of
predictions in prevalent proposal-based object detection frameworks. Albeit
simple and effective, most previous algorithms utilize a greedy process without
making sufficient use of properties of input data. In this work, we design a
new two-stage framework to effectively select the appropriate proposal
candidate for each object. The first stage suppresses most of easy negative
object proposals, while the second stage selects true positives in the reduced
proposal set. These two stages share the same network structure, \ie, an
encoder and a decoder formed as recurrent neural networks (RNN) with global
attention and context gate. The encoder scans proposal candidates in a
sequential manner to capture the global context information, which is then fed
to the decoder to extract optimal proposals. In our extensive experiments, the
proposed method outperforms other alternatives by a large margin.Comment: Accepted in NIPS 201
A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
Undirected neural sequence models such as BERT (Devlin et al., 2019) have
received renewed interest due to their success on discriminative natural
language understanding tasks such as question-answering and natural language
inference. The problem of generating sequences directly from these models has
received relatively little attention, in part because generating from
undirected models departs significantly from conventional monotonic generation
in directed sequence models. We investigate this problem by proposing a
generalized model of sequence generation that unifies decoding in directed and
undirected models. The proposed framework models the process of generation
rather than the resulting sequence, and under this framework, we derive various
neural sequence models as special cases, such as autoregressive,
semi-autoregressive, and refinement-based non-autoregressive models. This
unification enables us to adapt decoding algorithms originally developed for
directed sequence models to undirected sequence models. We demonstrate this by
evaluating various handcrafted and learned decoding strategies on a BERT-like
machine translation model (Lample & Conneau, 2019). The proposed approach
achieves constant-time translation results on par with linear-time translation
results from the same undirected sequence model, while both are competitive
with the state-of-the-art on WMT'14 English-German translation
Analysing Mathematical Reasoning Abilities of Neural Models
Mathematical reasoning---a core ability within human intelligence---presents
some unique challenges as a domain: we do not come to understand and solve
mathematical problems primarily on the back of experience and evidence, but on
the basis of inferring, learning, and exploiting laws, axioms, and symbol
manipulation rules. In this paper, we present a new challenge for the
evaluation (and eventually the design) of neural architectures and similar
system, developing a task suite of mathematics problems involving sequential
questions and answers in a free-form textual input/output format. The
structured nature of the mathematics domain, covering arithmetic, algebra,
probability and calculus, enables the construction of training and test splits
designed to clearly illuminate the capabilities and failure-modes of different
architectures, as well as evaluate their ability to compose and relate
knowledge and learned processes. Having described the data generation process
and its potential future expansions, we conduct a comprehensive analysis of
models from two broad classes of the most powerful sequence-to-sequence
architectures and find notable differences in their ability to resolve
mathematical problems and generalize their knowledge
Dynamic Past and Future for Neural Machine Translation
Previous studies have shown that neural machine translation (NMT) models can
benefit from explicitly modeling translated (Past) and untranslated (Future) to
groups of translated and untranslated contents through parts-to-wholes
assignment. The assignment is learned through a novel variant of
routing-by-agreement mechanism (Sabour et al., 2017), namely {\em Guided
Dynamic Routing}, where the translating status at each decoding step {\em
guides} the routing process to assign each source word to its associated group
(i.e., translated or untranslated content) represented by a capsule, enabling
translation to be made from holistic context. Experiments show that our
approach achieves substantial improvements over both RNMT and Transformer by
producing more adequate translations. Extensive analysis demonstrates that our
method is highly interpretable, which is able to recognize the translated and
untranslated contents as expected.Comment: Camera-ready version. Accepted to EMNLP 2019 as a long pape
Dynamic Computational Time for Visual Attention
We propose a dynamic computational time model to accelerate the average
processing time for recurrent visual attention (RAM). Rather than attention
with a fixed number of steps for each input image, the model learns to decide
when to stop on the fly. To achieve this, we add an additional continue/stop
action per time step to RAM and use reinforcement learning to learn both the
optimal attention policy and stopping policy. The modification is simple but
could dramatically save the average computational time while keeping the same
recognition performance as RAM. Experimental results on CUB-200-2011 and
Stanford Cars dataset demonstrate the dynamic computational model can work
effectively for fine-grained image recognition.The source code of this paper
can be obtained from https://github.com/baidu-research/DT-RA
Labeled Memory Networks for Online Model Adaptation
Augmenting a neural network with memory that can grow without growing the
number of trained parameters is a recent powerful concept with many exciting
applications. We propose a design of memory augmented neural networks (MANNs)
called Labeled Memory Networks (LMNs) suited for tasks requiring online
adaptation in classification models. LMNs organize the memory with classes as
the primary key.The memory acts as a second boosted stage following a regular
neural network thereby allowing the memory and the primary network to play
complementary roles. Unlike existing MANNs that write to memory for every
instance and use LRU based memory replacement, LMNs write only for instances
with non-zero loss and use label-based memory replacement. We demonstrate
significant accuracy gains on various tasks including word-modelling and
few-shot learning. In this paper, we establish their potential in online
adapting a batch trained neural network to domain-relevant labeled data at
deployment time. We show that LMNs are better than other MANNs designed for
meta-learning. We also found them to be more accurate and faster than
state-of-the-art methods of retuning model parameters for adapting to
domain-specific labeled data.Comment: Accepted at AAAI 2018, 8 page
- …