Access to external knowledge is essential for
many natural language processing tasks, such
as question answering and dialogue. Existing methods often rely on a parametric model
that stores knowledge in its parameters, or use
a retrieval-augmented model that has access
to an external knowledge source. Parametric
and retrieval-augmented models have complementary strengths in terms of computational
efficiency and predictive accuracy. To combine the strength of both approaches, we propose the Efficient Memory-Augmented Transformer (EMAT) – it encodes external knowledge into a key-value memory and exploits the
fast maximum inner product search for memory querying. We also introduce pre-training
tasks that allow EMAT to encode informative key-value representations, and to learn an
implicit strategy to integrate multiple memory slots into the transformer. Experiments
on various knowledge-intensive tasks such as
question answering and dialogue datasets show
that, simply augmenting parametric models
(T5-base) using our method produces more
accurate results (e.g., 25.8 → 44.3 EM on
NQ) while retaining a high throughput (e.g.,
1000 queries/s on NQ). Compared to retrievalaugmented models, EMAT runs substantially
faster across the board and produces more accurate results on WoW and ELI5.