Search CORE

725 research outputs found

Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI

Author: Chu Yunfei
Ji Luo
Jia Kunyang
Kuang Kun
Ma Jianxin
Shen Tao
Tan Ziqi
Wang Feng
Wu Anpeng
Wu Chao
Wu Fei
Yang Hongxia
Yao Jiangchao
Yao Yang
Zhang Fengda
Zhang Jianwei
Zhang Shengyu
Zhou Jingren
Publication venue
Publication date: 23/05/2022
Field of study

Influenced by the great success of deep learning via cloud computing and the rapid development of edge chips, research in artificial intelligence (AI) has shifted to both of the computing paradigms, i.e., cloud computing and edge computing. In recent years, we have witnessed significant progress in developing more advanced AI models on cloud servers that surpass traditional deep learning models owing to model innovations (e.g., Transformers, Pretrained families), explosion of training data and soaring computing capabilities. However, edge computing, especially edge and cloud collaborative computing, are still in its infancy to announce their success due to the resource-constrained IoT scenarios with very limited algorithms deployed. In this survey, we conduct a systematic review for both cloud and edge AI. Specifically, we are the first to set up the collaborative learning mechanism for cloud and edge modeling with a thorough review of the architectures that enable such mechanism. We also discuss potentials and practical experiences of some on-going advanced edge AI topics including pretraining models, graph neural networks and reinforcement learning. Finally, we discuss the promising directions and challenges in this field.Comment: 20 pages, Transactions on Knowledge and Data Engineerin

arXiv.org e-Print Archive

Running deep learning applications on resource constrained devices

Author: Vedula Naveen
Publication venue
Publication date: 25/08/2021
Field of study

The high accuracy of Deep Neural Networks (DNN) come at the expense of high computational cost and memory requirements. During inference, the data is often collected on the edge device which are resource-constrained. The existing solutions for edge deployment include i) executing the entire DNN on the edge (EDGE-ONLY), ii) sending the input from edge to cloud where the DNN is processed (CLOUD-ONLY), and iii) splitting the DNN to execute partially on the edge and partially on the cloud (SPLIT). The choice of deployment between EDGE-ONLY, CLOUD-ONLY and SPLIT is determined by several operating constraints such as device resources and network speed, and application constraints such as latency and accuracy. The EDGE-ONLY approach requires compact DNN with low compute and memory requirements. Thus, the emerging class of DNNs employ low-rank convolutions (LRCONVs) which reduce one or more dimensions compared to the spatial convolutions (CONV). Prior research in hardware accelerators has largely focused on CONVs. The LRCONVs such as depthwise and pointwise convolutions exhibit lower arithmetic intensity and lower data reuse. Thus, LRCONVs result in low hardware utilization and high latency. In our first work, we systematically explore the design space of Cross-layer dataflows to exploit data reuse across layers for emerging DNNs in EDGE-ONLY scenarios. We develop novel fine-grain cross-layer dataflows for LRCONVs that support partial loop dimension completion. Our tool, X-Layer decouples the nested loops in a pipeline and combines them to create a common outer dataflow and several inner dataflows. The CLOUD-ONLY approach can suffer from high latency due to the high transmission cost of large input data from the edge to the cloud. This could be a problem, especially for latency-critical applications. Thankfully, the SPLIT approach reduces latency compared to the CLOUD-ONLY approach. However, existing solutions only split the DNN in floating-point precision. Executing floating-point precision on the edge device can occupy large memory and reduce the potential options for SPLIT solutions. In our second work, we expand and explore the search space of SPLIT solutions by jointly applying mixed-precision post-training quantization and DNN graph split. Our work, Auto-Split finds a balance in the trade-off among the model accuracy, edge device capacity, transmission cost, and the overall latency

Simon Fraser University Institutional Repository

Quantized Deep Transfer Learning - Gearbox Fault Diagnosis on Edge Devices

Author: Hetarth Chopra
Jino Rohit
Nishtha Hooda
Prashant Singh Rana
Somnath Sarangi
Subrata Mukherjee
Vikash Kumar
Publication venue
Publication date: 22/09/2023
Field of study

This study has designed and implemented a deep transfer learning (DTL) model-based framework that takes an input time series of gearbox vibration patterns, which are accelerometer readings. It classifies the gear’s damage type from a predefined catalog. Industrial gearboxes are often operated even after damage because damage detection is formidable. It causes a lot of wear and tear, which leads to more repair costs. With this proposed DTL model-based framework, at an early stage, gearbox damage can be detected so that gears can be replaced immediately with less repair cost. The proposed methodology involves training a convolutional neural network (CNN) model using a transfer learning technique on a predefined dataset of eight types of gearbox conditions. Then, using quantization, the size of the CNN model is reduced, leading to easy inference on edge and embedded devices. An accuracy of 99.49 % using transfer learning of the VGG16 model is achieved, pre-trained on the Imagenet dataset. Other models and architectures were also tested, but VGG16 emerged as the winner. The methodology also addresses the problem of deployment on edge/embedded devices, as in most cases, accurate models are too heavy to be used in the industry due to memory and computation power constraints in embedded devices. This is done with the help of quantization, enabling the proposed model to be deployed on devices like the Raspberry Pi, leading to inference on the go without the need for the internet and cloud computing. Consequently, the current methodology achieved a 4x reduction in model size with the help of INT8 Quantization

ZENODO

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Author: Abadi Martín
Almeida Mario
Guo Chuan
Han Song
Hazelwood K.
He K
Hsieh Kevin
Hu C.
Huang Gao
Jacob B.
Kaya Yigitcan
Kouris A.
Kouris A.
Kouris A.
Kozyrakis C.
Lane N. D.
Laskaridis Stefanos
Lee Royson
Li E.
Li Hao
Li Hongshan
Liu Yizhi
Migacz Szymon
Nair Vinod
Nikolić Miloš
Norman
Oakes Edward
Raghu Maithra
Rhu M.
Simonyan K.
Smolyanskiy N.
Stock Pierre
Szegedy C.
Szegedy Christian
Teerapittayanon S.
Wang Liang
Wu C.
Zhang Linfeng
Zhou Aojun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/08/2020
Field of study

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.Comment: Accepted at the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 202

arXiv.org e-Print Archive

Crossref