SPINN: Synergistic Progressive Inference of Neural Networks over Device
  and Cloud

Abadi Martín; Almeida Mario; Guo Chuan; Han Song; Hazelwood K.; He K; Hsieh Kevin; Hu C.; Huang Gao; Jacob B.; Kaya Yigitcan; Kouris A.; Kouris A.; Kouris A.; Kozyrakis C.; Lane N. D.; Laskaridis Stefanos; Lee Royson; Li E.; Li Hao; Li Hongshan; Liu Yizhi; Migacz Szymon; Nair Vinod; Nikolić Miloš; Norman; Oakes Edward; Raghu Maithra; Rhu M.; Simonyan K.; Smolyanskiy N.; Stock Pierre; Szegedy C.; Szegedy Christian; Teerapittayanon S.; Wang Liang; Wu C.; Zhang Linfeng; Zhou Aojun

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Authors: Abadi Martín
Almeida Mario
Guo Chuan
Han Song
Hazelwood K.
He K
Hsieh Kevin
Hu C.
Huang Gao
Jacob B.
Kaya Yigitcan
Kouris A.
Kouris A.
Kouris A.
Kozyrakis C.
Lane N. D.
Laskaridis Stefanos
Lee Royson
Li E.
Li Hao
Li Hongshan
Liu Yizhi
Migacz Szymon
Nair Vinod
Nikolić Miloš
Norman
Oakes Edward
Raghu Maithra
Rhu M.
Simonyan K.
Smolyanskiy N.
Stock Pierre
Szegedy C.
Szegedy Christian
Teerapittayanon S.
Wang Liang
Wu C.
Zhang Linfeng
Zhou Aojun
Publication date: 24 August 2020
Publisher: 'Association for Computing Machinery (ACM)'
Doi

Abstract

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.Comment: Accepted at the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 202

Similar works

Full text

Available Versions

Crossref

Last time updated on 10/08/2021