BERT-based neural architectures have established themselves as popular
state-of-the-art baselines for many downstream NLP tasks. However, these
architectures are data-hungry and consume a lot of memory and energy, often
hindering their deployment in many real-time, resource-constrained
applications. Existing lighter versions of BERT (eg. DistilBERT and TinyBERT)
often cannot perform well on complex NLP tasks. More importantly, from a
designer's perspective, it is unclear what is the "right" BERT-based
architecture to use for a given NLP task that can strike the optimal trade-off
between the resources available and the minimum accuracy desired by the end
user. System engineers have to spend a lot of time conducting trial-and-error
experiments to find a suitable answer to this question. This paper presents an
exploratory study of BERT-based models under different resource constraints and
accuracy budgets to derive empirical observations about this resource/accuracy
trade-offs. Our findings can help designers to make informed choices among
alternative BERT-based architectures for embedded systems, thus saving
significant development time and effort