194,888 research outputs found
DeepMon: Mobile GPU-based deep learning framework for continuous vision applications
© 2017 ACM. The rapid emergence of head-mounted devices such as the Microsoft Holo-lens enables a wide variety of continuous vision applications. Such applications often adopt deep-learning algorithms such as CNN and RNN to extract rich contextual information from the first-person-view video streams. Despite the high accuracy, use of deep learning algorithms in mobile devices raises critical challenges, i.e., high processing latency and power consumption. In this paper, we propose DeepMon, a mobile deep learning inference system to run a variety of deep learning inferences purely on a mobile device in a fast and energy-efficient manner. For this, we designed a suite of optimization techniques to efficiently offload convolutional layers to mobile GPUs and accelerate the processing; note that the convolutional layers are the common performance bottleneck of many deep learning models. Our experimental results show that DeepMon can classify an image over the VGG-VeryDeep-16 deep learning model in 644ms on Samsung Galaxy S7, taking an important step towards continuous vision without imposing any privacy concerns nor networking cost.N
Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices
Deep learning (DL) is characterised by its dynamic nature, with new deep
neural network (DNN) architectures and approaches emerging every few years,
driving the field's advancement. At the same time, the ever-increasing use of
mobile devices (MDs) has resulted in a surge of DNN-based mobile applications.
Although traditional architectures, like CNNs and RNNs, have been successfully
integrated into MDs, this is not the case for Transformers, a relatively new
model family that has achieved new levels of accuracy across AI tasks, but
poses significant computational challenges. In this work, we aim to make steps
towards bridging this gap by examining the current state of Transformers'
on-device execution. To this end, we construct a benchmark of representative
models and thoroughly evaluate their performance across MDs with different
computational capabilities. Our experimental results show that Transformers are
not accelerator-friendly and indicate the need for software and hardware
optimisations to achieve efficient deployment.Comment: Accepted at the 3rd IEEE International Workshop on Distributed
Intelligent Systems (DistInSys), 202
Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Modern mobile applications are benefiting significantly from the advancement
in deep learning, e.g., implementing real-time image recognition and
conversational system. Given a trained deep learning model, applications
usually need to perform a series of matrix operations based on the input data,
in order to infer possible output values. Because of computational complexity
and size constraints, these trained models are often hosted in the cloud. To
utilize these cloud-based models, mobile apps will have to send input data over
the network. While cloud-based deep learning can provide reasonable response
time for mobile apps, it restricts the use case scenarios, e.g. mobile apps
need to have network access. With mobile specific deep learning optimizations,
it is now possible to employ on-device inference. However, because mobile
hardware, such as GPU and memory size, can be very limited when compared to its
desktop counterpart, it is important to understand the feasibility of this new
on-device deep learning inference architecture. In this paper, we empirically
evaluate the inference performance of three Convolutional Neural Networks
(CNNs) using a benchmark Android application we developed. Our measurement and
analysis suggest that on-device inference can cost up to two orders of
magnitude greater response time and energy when compared to cloud-based
inference, and that loading model and computing probability are two performance
bottlenecks for on-device deep inferences.Comment: Accepted at The IEEE International Conference on Cloud Engineering
(IC2E) conference 201
- …