15,952 research outputs found
GPU-based Image Analysis on Mobile Devices
With the rapid advances in mobile technology many mobile devices are capable
of capturing high quality images and video with their embedded camera. This
paper investigates techniques for real-time processing of the resulting images,
particularly on-device utilizing a graphical processing unit. Issues and
limitations of image processing on mobile devices are discussed, and the
performance of graphical processing units on a range of devices measured
through a programmable shader implementation of Canny edge detection.Comment: Proceedings of Image and Vision Computing New Zealand 201
Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Modern mobile applications are benefiting significantly from the advancement
in deep learning, e.g., implementing real-time image recognition and
conversational system. Given a trained deep learning model, applications
usually need to perform a series of matrix operations based on the input data,
in order to infer possible output values. Because of computational complexity
and size constraints, these trained models are often hosted in the cloud. To
utilize these cloud-based models, mobile apps will have to send input data over
the network. While cloud-based deep learning can provide reasonable response
time for mobile apps, it restricts the use case scenarios, e.g. mobile apps
need to have network access. With mobile specific deep learning optimizations,
it is now possible to employ on-device inference. However, because mobile
hardware, such as GPU and memory size, can be very limited when compared to its
desktop counterpart, it is important to understand the feasibility of this new
on-device deep learning inference architecture. In this paper, we empirically
evaluate the inference performance of three Convolutional Neural Networks
(CNNs) using a benchmark Android application we developed. Our measurement and
analysis suggest that on-device inference can cost up to two orders of
magnitude greater response time and energy when compared to cloud-based
inference, and that loading model and computing probability are two performance
bottlenecks for on-device deep inferences.Comment: Accepted at The IEEE International Conference on Cloud Engineering
(IC2E) conference 201
EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices
In recent years, advances in deep learning have resulted in unprecedented
leaps in diverse tasks spanning from speech and object recognition to context
awareness and health monitoring. As a result, an increasing number of
AI-enabled applications are being developed targeting ubiquitous and mobile
devices. While deep neural networks (DNNs) are getting bigger and more complex,
they also impose a heavy computational and energy burden on the host devices,
which has led to the integration of various specialized processors in commodity
devices. Given the broad range of competing DNN architectures and the
heterogeneity of the target hardware, there is an emerging need to understand
the compatibility between DNN-platform pairs and the expected performance
benefits on each platform. This work attempts to demystify this landscape by
systematically evaluating a collection of state-of-the-art DNNs on a wide
variety of commodity devices. In this respect, we identify potential
bottlenecks in each architecture and provide important guidelines that can
assist the community in the co-design of more efficient DNNs and accelerators.Comment: Accepted at MobiSys 2019: 3rd International Workshop on Embedded and
Mobile Deep Learning (EMDL), 201
Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices
Convolutional Neural Networks (CNNs) have revolutionized the research in
computer vision, due to their ability to capture complex patterns, resulting in
high inference accuracies. However, the increasingly complex nature of these
neural networks means that they are particularly suited for server computers
with powerful GPUs. We envision that deep learning applications will be
eventually and widely deployed on mobile devices, e.g., smartphones,
self-driving cars, and drones. Therefore, in this paper, we aim to understand
the resource requirements (time, memory) of CNNs on mobile devices. First, by
deploying several popular CNNs on mobile CPUs and GPUs, we measure and analyze
the performance and resource usage for every layer of the CNNs. Our findings
point out the potential ways of optimizing the performance on mobile devices.
Second, we model the resource requirements of the different CNN computations.
Finally, based on the measurement, pro ling, and modeling, we build and
evaluate our modeling tool, Augur, which takes a CNN configuration (descriptor)
as the input and estimates the compute time and resource usage of the CNN, to
give insights about whether and how e ciently a CNN can be run on a given
mobile platform. In doing so Augur tackles several challenges: (i) how to
overcome pro ling and measurement overhead; (ii) how to capture the variance in
different mobile platforms with different processors, memory, and cache sizes;
and (iii) how to account for the variance in the number, type and size of
layers of the different CNN configurations
Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices
Internet of Things(IoT) devices, mobile phones, and robotic systems are often
denied the power of deep learning algorithms due to their limited computing
power. However, to provide time-critical services such as emergency response,
home assistance, surveillance, etc, these devices often need real-time analysis
of their camera data. This paper strives to offer a viable approach to
integrate high-performance deep learning-based computer vision algorithms with
low-resource and low-power devices by leveraging the computing power of the
cloud. By offloading the computation work to the cloud, no dedicated hardware
is needed to enable deep neural networks on existing low computing power
devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the
power of using cloud computing to perform real-time vision tasks. Furthermore,
to reduce latency and improve real-time performance, compression algorithms are
proposed and evaluated for streaming real-time video frames to the cloud.Comment: Accepted to The 11th International Conference on Machine Vision (ICMV
2018). Project site: https://zhengyiluo.github.io/projects/cloudchaser
- …