396 research outputs found
Predefined Sparseness in Recurrent Sequence Models
Inducing sparseness while training neural networks has been shown to yield
models with a lower memory footprint but similar effectiveness to dense models.
However, sparseness is typically induced starting from a dense model, and thus
this advantage does not hold during training. We propose techniques to enforce
sparseness upfront in recurrent sequence models for NLP applications, to also
benefit training. First, in language modeling, we show how to increase hidden
state sizes in recurrent layers without increasing the number of parameters,
leading to more expressive models. Second, for sequence labeling, we show that
word embeddings with predefined sparseness lead to similar performance as dense
embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning
(CoNLL, 2018
Run-Time Efficient RNN Compression for Inference on Edge Devices
Recurrent neural networks can be large and compute-intensive, yet many
applications that benefit from RNNs run on small devices with very limited
compute and storage capabilities while still having run-time constraints. As a
result, there is a need for compression techniques that can achieve significant
compression without negatively impacting inference run-time and task accuracy.
This paper explores a new compressed RNN cell implementation called Hybrid
Matrix Decomposition (HMD) that achieves this dual objective. This scheme
divides the weight matrix into two parts - an unconstrained upper half and a
lower half composed of rank-1 blocks. This results in output features where the
upper sub-vector has "richer" features while the lower-sub vector has
"constrained features". HMD can compress RNNs by a factor of 2-4x while having
a faster run-time than pruning (Zhu &Gupta, 2017) and retaining more model
accuracy than matrix factorization (Grachev et al., 2017). We evaluate this
technique on 5 benchmarks spanning 3 different applications, illustrating its
generality in the domain of edge computing.Comment: Published at 4th edition of Workshop on Energy Efficient Machine
Learning and Cognitive Computing for Embedded Applications at International
Symposium of Computer Architecture 2019, Phoenix, Arizona
(https://www.emc2-workshop.com/isca-19) colocated with ISCA 201
Artificial Intelligence for Sign Language Recognition and Translation
In a world where people are more connected, the barriers between deaf people and hearing
people is more visible than ever. A neural sign language translation system would break
many of these barriers. However, there are still many tasks to be solved before full automatic
sign language translation is possible. Sign Language Translation is a difficult multimodal
machine translation problem with no clear one-to-one mapping to any spoken language.
In this paper I give a review of sign language and its challenges regarding neural machine
translation. I evaluate the state-of-the-art Sign Language Translation approach, and apply
a modified version of the Evolved Transformer to the existing Sign Language Transformer.
I show that the Evolved Transformer encoder produces better results over the Transformer
encoder with lower dimensions
Towards GPU Utilization Prediction for Cloud Deep Learning
Understanding the GPU utilization of Deep Learning (DL) workloads is important for enhancing resource-efficiency and cost-benefit decision making for DL frameworks in the cloud. Current approaches to determine DL workload GPU utilization rely on online profiling within isolated GPU devices, and must be performed for every unique DL workload submission resulting in resource under-utilization and reduced service availability. In this paper, we propose a prediction engine to proactively determine the GPU utilization of heterogeneous DL workloads without the need for in-depth or isolated online profiling. We demonstrate that it is possible to predict DL workload GPU utilization via extracting information from its model computation graph. Our experiments show that the prediction engine achieves an RMSLE of 0.154, and can be exploited by DL schedulers to achieve up to 61.5% improvement to GPU cluster utilization
- …