Deep neural networks use skip connections to improve training convergence.
However, these skip connections are costly in hardware, requiring extra buffers
and increasing on- and off-chip memory utilization and bandwidth requirements.
In this paper, we show that skip connections can be optimized for hardware when
tackled with a hardware-software codesign approach. We argue that while a
network's skip connections are needed for the network to learn, they can later
be removed or shortened to provide a more hardware efficient implementation
with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose
hardware-aware training algorithm gradually removes or shortens a fully trained
network's skip connections to lower their hardware cost. Tailor improves
resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for
on-chip, dataflow-style architectures. Tailor increases performance by 30% and
reduces memory bandwidth by 45% for a 2D processing element array architecture