Edge TPUs are a domain of accelerators for low-power, edge devices and are
widely used in various Google products such as Coral and Pixel devices. In this
paper, we first discuss the major microarchitectural details of Edge TPUs.
Then, we extensively evaluate three classes of Edge TPUs, covering different
computing ecosystems, that are either currently deployed in Google products or
are the product pipeline, across 423K unique convolutional neural networks.
Building upon this extensive study, we discuss critical and interpretable
microarchitectural insights about the studied classes of Edge TPUs. Mainly, we
discuss how Edge TPU accelerators perform across convolutional neural networks
with different structures. Finally, we present our ongoing efforts in
developing high-accuracy learned machine learning models to estimate the major
performance metrics of accelerators such as latency and energy consumption.
These learned models enable significantly faster (in the order of milliseconds)
evaluations of accelerators as an alternative to time-consuming cycle-accurate
simulators and establish an exciting opportunity for rapid hard-ware/software
co-design.Comment: 11 pages, 15 figures, submitted to ISCA 202