6,255 research outputs found
Selective Attention for Context-aware Neural Machine Translation
Despite the progress made in sentence-level NMT, current systems still fall
short at achieving fluent, good quality translation for a full document. Recent
works in context-aware NMT consider only a few previous sentences as context
and may not scale to entire documents. To this end, we propose a novel and
scalable top-down approach to hierarchical attention for context-aware NMT
which uses sparse attention to selectively focus on relevant sentences in the
document context and then attends to key words in those sentences. We also
propose single-level attention approaches based on sentence or word-level
information in the context. The document-level context representation, produced
from these attention modules, is integrated into the encoder or decoder of the
Transformer model depending on whether we use monolingual or bilingual context.
Our experiments and evaluation on English-German datasets in different document
MT settings show that our selective attention approach not only significantly
outperforms context-agnostic baselines but also surpasses context-aware
baselines in most cases.Comment: Accepted at NAACL-HLT 201
Labeled Memory Networks for Online Model Adaptation
Augmenting a neural network with memory that can grow without growing the
number of trained parameters is a recent powerful concept with many exciting
applications. We propose a design of memory augmented neural networks (MANNs)
called Labeled Memory Networks (LMNs) suited for tasks requiring online
adaptation in classification models. LMNs organize the memory with classes as
the primary key.The memory acts as a second boosted stage following a regular
neural network thereby allowing the memory and the primary network to play
complementary roles. Unlike existing MANNs that write to memory for every
instance and use LRU based memory replacement, LMNs write only for instances
with non-zero loss and use label-based memory replacement. We demonstrate
significant accuracy gains on various tasks including word-modelling and
few-shot learning. In this paper, we establish their potential in online
adapting a batch trained neural network to domain-relevant labeled data at
deployment time. We show that LMNs are better than other MANNs designed for
meta-learning. We also found them to be more accurate and faster than
state-of-the-art methods of retuning model parameters for adapting to
domain-specific labeled data.Comment: Accepted at AAAI 2018, 8 page
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
- …