1 research outputs found
Exploring the power of GPU's for training Polyglot language models
One of the major research trends currently is the evolution of heterogeneous
parallel computing. GP-GPU computing is being widely used and several
applications have been designed to exploit the massive parallelism that
GP-GPU's have to offer. While GPU's have always been widely used in areas of
computer vision for image processing, little has been done to investigate
whether the massive parallelism provided by GP-GPU's can be utilized
effectively for Natural Language Processing(NLP) tasks. In this work, we
investigate and explore the power of GP-GPU's in the task of learning language
models. More specifically, we investigate the performance of training Polyglot
language models using deep belief neural networks. We evaluate the performance
of training the model on the GPU and present optimizations that boost the
performance on the GPU.One of the key optimizations, we propose increases the
performance of a function involved in calculating and updating the gradient by
approximately 50 times on the GPU for sufficiently large batch sizes. We show
that with the above optimizations, the GP-GPU's performance on the task
increases by factor of approximately 3-4. The optimizations we made are generic
Theano optimizations and hence potentially boost the performance of other
models which rely on these operations.We also show that these optimizations
result in the GPU's performance at this task being now comparable to that on
the CPU. We conclude by presenting a thorough evaluation of the applicability
of GP-GPU's for this task and highlight the factors limiting the performance of
training a Polyglot model on the GPU.Comment: version 2 (just corrected citation