The explosive growth of various types of big data and advances in AI
technologies have catalyzed a new type of applications called multi-modal DNNs.
Multi-modal DNNs are capable of interpreting and reasoning about information
from multiple modalities, making them more applicable to real-world AI
scenarios. In recent research, multi-modal DNNs have outperformed the best
uni-modal DNN in a wide range of applications from traditional multimedia to
emerging autonomous systems. However, despite their importance and superiority,
very limited research attention has been devoted to understand the
characteristics of multi-modal DNNs and their implications on current computing
software/hardware platforms.
To facilitate research and advance the understanding of these multi-modal DNN
workloads, we first present MMbench, an open-source benchmark suite consisting
of a set of real-world multi-modal DNN workloads with relevant performance
metrics for evaluation. Then we use MMbench to conduct an in-depth analysis on
the characteristics of multi-modal DNNs. We study their implications on
application and programming framework, operating and scheduling system, as well
as execution hardware. Finally, we conduct a case study and extend our
benchmark to edge devices. We hope that our work can provide guidance for
future software/hardware design and optimization to underpin multi-modal DNNs
on both cloud and edge computing platforms