DNN workloads can be scheduled onto DNN accelerators in many different ways:
from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a.
layer fusion, or cascaded execution). This results in a very broad scheduling
space, with each schedule leading to varying hardware (HW) costs in terms of
energy and latency. To rapidly explore this vast space for a wide variety of
hardware architectures, analytical cost models are crucial to estimate
scheduling effects on the HW level. However, state-of-the-art cost models are
lacking support for exploring the complete depth-first scheduling space, for
instance focusing only on activations while ignoring weights, or modeling only
DRAM accesses while overlooking on-chip data movements. These limitations
prevent researchers from systematically and accurately understanding the
depth-first scheduling space.
After formalizing this design space, this work proposes a unified modeling
framework, DeFiNES, for layer-by-layer and depth-first scheduling to fill in
the gaps. DeFiNES enables analytically estimating the hardware cost for
possible schedules in terms of both energy and latency, while considering data
access at every memory level. This is done for each schedule and HW
architecture under study by optimally choosing the active part of the memory
hierarchy per unique combination of operand, layer, and feature map tile. The
hardware costs are estimated, taking into account both data computation and
data copy phases. The analytical cost model is validated against measured data
from a taped-out depth-first DNN accelerator, DepFiN, showing good modeling
accuracy at the end-to-end neural network level. A comparison with generalized
state-of-the-art demonstrates up to 10X better solutions found with DeFiNES.Comment: Accepted by HPCA 202