Exploring Challenges of Deploying BERT-based NLP Models in
  Resource-Constrained Embedded Devices

Babar, Mohammad Fakhruddin; Hasan, Monowar; Hassan, Md Mahadi; Santu, Shubhra Kanti Karmaker; Sarkar, Souvika

Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

Authors: Mohammad Fakhruddin Babar
Monowar Hasan
Md Mahadi Hassan
Shubhra Kanti Karmaker Santu
Souvika Sarkar
Publication date: 31 May 2023
Publisher

Abstract

BERT-based neural architectures have established themselves as popular state-of-the-art baselines for many downstream NLP tasks. However, these architectures are data-hungry and consume a lot of memory and energy, often hindering their deployment in many real-time, resource-constrained applications. Existing lighter versions of BERT (eg. DistilBERT and TinyBERT) often cannot perform well on complex NLP tasks. More importantly, from a designer's perspective, it is unclear what is the "right" BERT-based architecture to use for a given NLP task that can strike the optimal trade-off between the resources available and the minimum accuracy desired by the end user. System engineers have to spend a lot of time conducting trial-and-error experiments to find a suitable answer to this question. This paper presents an exploratory study of BERT-based models under different resource constraints and accuracy budgets to derive empirical observations about this resource/accuracy trade-offs. Our findings can help designers to make informed choices among alternative BERT-based architectures for embedded systems, thus saving significant development time and effort

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.11520

Last time updated on 26/04/2023