Search CORE

396 research outputs found

Predefined Sparseness in Recurrent Sequence Models

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Godin Fréderic
Publication venue
Publication date: 01/01/2018
Field of study

Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning (CoNLL, 2018

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Run-Time Efficient RNN Compression for Inference on Edge Devices

Author: Beu Jesse
Dasika Ganesh
Gope Dibakar
Mattina Matthew
Thakker Urmish
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2020
Field of study

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning (Zhu &Gupta, 2017) and retaining more model accuracy than matrix factorization (Grachev et al., 2017). We evaluate this technique on 5 benchmarks spanning 3 different applications, illustrating its generality in the domain of edge computing.Comment: Published at 4th edition of Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications at International Symposium of Computer Architecture 2019, Phoenix, Arizona (https://www.emc2-workshop.com/isca-19) colocated with ISCA 201

arXiv.org e-Print Archive

Crossref

Artificial Intelligence for Sign Language Recognition and Translation

Author: Haglund Andreas
Publication venue: 'University of Agder'
Publication date: 01/01/2022
Field of study

In a world where people are more connected, the barriers between deaf people and hearing people is more visible than ever. A neural sign language translation system would break many of these barriers. However, there are still many tasks to be solved before full automatic sign language translation is possible. Sign Language Translation is a difficult multimodal machine translation problem with no clear one-to-one mapping to any spoken language. In this paper I give a review of sign language and its challenges regarding neural machine translation. I evaluate the state-of-the-art Sign Language Translation approach, and apply a modified version of the Evolved Transformer to the existing Sign Language Transformer. I show that the Evolved Transformer encoder produces better results over the Transformer encoder with lower dimensions

NORA - Norwegian Open Research Archives

Agder University Research Archive

Towards GPU Utilization Prediction for Cloud Deep Learning

Author: Borowiec Damian
Friday Adrian
Garraghan Peter
Harper R.H.R.
Yeung Ging-Fung
Publication venue: USENIX Association
Publication date: 01/05/2020
Field of study

Understanding the GPU utilization of Deep Learning (DL) workloads is important for enhancing resource-efficiency and cost-benefit decision making for DL frameworks in the cloud. Current approaches to determine DL workload GPU utilization rely on online profiling within isolated GPU devices, and must be performed for every unique DL workload submission resulting in resource under-utilization and reduced service availability. In this paper, we propose a prediction engine to proactively determine the GPU utilization of heterogeneous DL workloads without the need for in-depth or isolated online profiling. We demonstrate that it is possible to predict DL workload GPU utilization via extracting information from its model computation graph. Our experiments show that the prediction engine achieves an RMSLE of 0.154, and can be exploited by DL schedulers to achieve up to 61.5% improvement to GPU cluster utilization

Lancaster E-Prints

Deep Learning in the Maintenance Industry

Author: Silva Paulo Cesar Ribeiro
Publication venue: Dublin Institute of Technology
Publication date: 15/11/2020
Field of study

Arrow@TUDublin