Search CORE

60,494 research outputs found

Fast Neural Machine Translation Implementation

Author: Dwojak Tomasz
Heafield Kenneth
Hoang Hieu
Krislauks Rihards
Torregrosa Daniel
Publication venue
Publication date: 01/01/2018
Field of study

This paper describes the submissions to the efficiency track for GPUs at the Workshop for Neural Machine Translation and Generation by members of the University of Edinburgh, Adam Mickiewicz University, Tilde and University of Alicante. We focus on efficient implementation of the recurrent deep-learning model as implemented in Amun, the fast inference engine for neural machine translation. We improve the performance with an efficient mini-batching algorithm, and by fusing the softmax operation with the k-best extraction algorithm. Submissions using Amun were first, second and third fastest in the GPU efficiency track

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Fast machine translation on parallel and massively parallel hardware

Author: Bogoychev Nikolay Veselinov
Publication venue: The University of Edinburgh
Publication date: 01/07/2019
Field of study

Parallel systems have been widely adopted in the field of machine translation, because the raw computational power they offer is well suited to this computationally intensive task. However programming for parallel hardware is not trivial as it requires redesign of the existing algorithms. In my thesis I design efficient algorithms for machine translation on parallel hardware. I identify memory accesses as the biggest bottleneck to processing speed and propose novel algorithms that minimize them. I present three distinct case studies in which minimizing memory access substantially improves speed: Starting with statistical machine translation, I design a phrase table that makes decoding ten times faster on a multi-threaded CPU. Next, I design a GPU-based n-gram language model that is twice as fast per £ as a highly optimized CPU implementation. Turning to neural machine translation, I design new stochastic gradient descent techniques that make end-to-end training twice as fast. The work in this thesis has been incorporated in two popular machine translation toolkits: Moses and Marian

Edinburgh Research Archive

FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

Author: Hussain Shehzeen
Javaheripi Mojan
Kastner Ryan
Koushanfar Farinaz
Neekhara Paarth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/02/2020
Field of study

Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU. In this work, we develop the first accelerator platform~\textit{FastWave} for autoregressive convolutional neural networks, and address the associated design challenges. We design the Fast-Wavenet inference model in Vivado HLS and perform a wide range of optimizations including fixed-point implementation, array partitioning and pipelining. Our model uses a fully parameterized parallel architecture for fast matrix-vector multiplication that enables per-layer customized latency fine-tuning for further throughput improvement. Our experiments comparatively assess the trade-off between throughput and resource utilization for various optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses only on-chip memory, achieves 66 faster generation speed compared to CPU implementation and 11 faster generation speed than GPU implementation.Comment: Published as a conference paper at ICCAD 201

arXiv.org e-Print Archive

Crossref

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

Author: Alkhouli Tamer
Ney Hermann
Zeyer Albert
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.Comment: accepted as demo paper on ACL 201

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Simple Recurrent Units for Highly Parallelizable Recurrence

Author: Artzi Yoav
Dai Hui
Lei Tao
Wang Sida I.
Zhang Yu
Publication venue
Publication date: 01/01/2018
Field of study

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model on translation by incorporating SRU into the architecture.Comment: EMNL

arXiv.org e-Print Archive

Crossref