MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive
  Multi-Accelerator Systems

Chen, Quan; Ding, Wenchao; Guo, Minyi; Lin, Zhe; Shen, Guan; Wang, Zeke; Wu, Chentao; Zhao, Jieru

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Authors: Quan Chen
Wenchao Ding
Minyi Guo
Zhe Lin
Guan Shen
Zeke Wang
Chentao Wu
Jieru Zhao
Publication date: 23 July 2023
Publisher

Abstract

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers, cloud platforms, and SoCs. Thus, a challenging problem arises in multi-accelerator systems: selecting a proper combination of accelerators from available designs and searching for efficient DNN mapping strategies. To this end, we propose MARS, a novel mapping framework that can perform computation-aware accelerator selection, and apply communication-aware sharding strategies to maximize parallelism. Experimental results show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.Comment: Accepted by 60th DA

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.12234

Last time updated on 28/07/2023