Benchmarking TinyML Systems: Challenges and Direction by Banbury, Colby R. et al.
BENCHMARKING TINYML SYSTEMS: CHALLENGES AND DIRECTION
Colby R. Banbury 1 Vijay Janapa Reddi 1 Max Lam 1 William Fu 1 Amin Fazel 2 Jeremy Holleman 3 4
Xinyuan Huang 5 Robert Hurtado 6 David Kanter 7 Anton Lokhmotov 8 David Patterson 9 10 Danilo Pau 11
Jae-sun Seo 12 Jeff Sieracki 13 Urmish Thakker 14 Marian Verhelst 15 16 Poonam Yadav 17
ABSTRACT
Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new
class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark
for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve
the performance of systems. In this position paper, we present the current landscape of TinyML and discuss the
challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Our
viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of 30 organizations.
1 INTRODUCTION
Machine learning (ML) inference on the edge is an increas-
ingly attractive prospect due to its potential for increasing
energy efficiency (Fedorov et al., 2019), privacy, responsive-
ness (Zhang et al., 2017), and autonomy of edge devices.
Thus far, the field edge ML has predominately focused on
mobile inference which has led to numerous advancements
in machine learning models such as exploiting pruning, spar-
sity, and quantization. But in recent years, there have major
been strides in expanding the scope of edge systems. In-
terest is brewing in both academia (Fedorov et al., 2019;
Zhang et al., 2017) and industry (Flamand et al., 2018; War-
den, 2018a) towards expanding the scope of edge ML to
microcontroller-class devices.
The goal of “TinyML” (tinyML Foundation, 2019) is to
bring ML inference to ultra-low-power devices, typically un-
der a milliWatt, and thereby break the traditional power bar-
rier preventing widely distributed machine intelligence. By
performing inference on-device, and near-sensor, TinyML
enables greater responsiveness and privacy while avoiding
the energy cost associated with wireless communication,
which at this scale is far higher than that of compute (War-
den, 2018b). Furthermore, the efficiency of TinyML enables
1Harvard University 2Samsung Semiconductor, Inc. 3Syntiant
4University of North Carolina, Charlotte 5Cisco Systems
6California State Polytechnic University, Pomona 7Real World
Insights 8dividiti 9University of California, Berkeley 10Google
11STMicroelectronics, Italy 12Arizona State University 13Reality
AI 14Arm ML Research Lab 15KU Leuven 16Interuniversity Micro-
electronics Centre (IMEC) 17University of York. Correspondence
to: Colby R. Banbury <cbanbury@g.harvard.edu>.
Proceedings of the 3 rd MLSys Conference, Austin, TX, USA,
2020. Copyright 2020 by the author(s).
a class of smart, battery-powered, always-on applications
that can revolutionize the real-time collection and process-
ing of data. This emerging field, which is the culmination
of many innovations, is poised only further to accelerate its
growth in the coming years.
To unlock the full potential of the field, hardware software
co-design is required. Specifically, TinyML models must
be small enough to fit within the tight constraints of MCU-
class devices (e.g., a few hundred kB of memory and limited
onboard compute horsepower in the order of MHz proces-
sor clock speed), thus limiting the size of the input and the
number of layers (Zhang et al., 2017) or necessitating the
use lightweight, non-neural network-based techniques (Ku-
mar et al., 2017). TinyML tools are broadly defined as
anything that enables the design, mapping, and deployment
of TinyML algorithms including aggressive quantization
techniques (Wang et al., 2019), memory aware neural archi-
tecture searches (Fedorov et al., 2019), frameworks (Ten-
sorFlow), and efficient inference libraries (Lai et al., 2018;
Garofalo et al., 2019). Efforts in TinyML hardware in-
clude improving inference on the next generation of general-
purpose MCUs (arm; Flamand et al., 2018), developing
hardware specialized for low power inference, and creating
novel architectures intended only as inference engines for
specific tasks (Moons et al., 2018).
The complexity and dynamicity of the field obscure the mea-
surement of progress and make dynamism design decisions
intractable. In order to enable the continued innovation, a
fair and reliable method of comparison is needed. Since
progress is often the result of increased hardware capability,
a reliable TinyML hardware benchmark is required.
In this paper, we discuss the challenges and opportunities
associated with the development of a TinyML hardware
ar
X
iv
:2
00
3.
04
82
1v
2 
 [c
s.P
F]
  2
1 M
ay
 20
20
Benchmarking TinyML Systems
Table 1. Use Cases, Models, and Datasets
INPUT TYPE USE CASES MODEL TYPES DATASETS
AUDIO
AUDIO WAKE WORDS
CONTEXT RECOGNITION
CONTROL WORDS
KEYWORD DETECTION
DNN
CNN
RNN
LSTM
SPEECH COMMANDS (WARDEN, 2018A)
AUDIOSET (GEMMEKE ET AL., 2017)
EXTRASENSORY (VAIZMAN ET AL., 2017)
IMAGE
VISUAL WAKE WORDS
OBJECT DETECTION
GESTURE RECOGNITION
OBJECT COUNTING
TEXT RECOGNITION
DNN
CNN
SVM
DECISION TREES
KNN
LINEAR
VISUAL WAKE WORDS (CHOWDHERY ET AL., 2019)
CIFAR10 (KRIZHEVSKY ET AL., 2009)
MNIST (LECUN & CORTES, 2010)
IMAGENET (DENG ET AL., 2009)
DVS128 GESTURE (AMIR ET AL., 2017)
PHYSIOLOGICAL /
BEHAVIORAL
METRICS
SEGMENTATION
FORECASTING
ACTIVITY DETECTION
DNN
DECISION TREE
SVM
LINEAR
PHYSIONET (GOLDBERGER ET AL., 2000)
HAR (CRAMARIUC, 2019)
DSA (ALTUN ET AL., 2010)
OPPORTUNITY (ROGGEN ET AL., 2010)
UCI EMG (LOBOV ET AL., 2018)
INDUSTRY
TELEMETRY
SENSING (LIGHT, TEMP, ETC)
ANOMALY DETECTION
MOTOR CONTROL
PREDICTIVE MAINTENANCE
DNN
DECISION TREE
SVM
LINEAR
NAIVE BAYES
UCI AIR QUALITY (DE VITO ET AL., 2008)
UCI GAS (VERGARA ET AL., 2012)
NASA’S PCOE (SAXENA & GOEBEL, 2008)
benchmark. Our short paper is a call to action for estab-
lishing a common benchmarking for TinyML workloads on
emerging TinyML hardware to foster the development of
TinyML applications. The points presented here reflect the
ongoing effort of the TinyMLPerf working group that is cur-
rently comprised of over 30 organizations and 75 members.
The rest of the paper is organized as follows. In Section 2,
we discuss the application landscape of TinyML, including
the existing use cases, models, and datasets. In Section 3, we
describe the existing TinyML hardware solutions, including
outlining improvements to general-purpose MCUs and the
development of novel architectures. In Section 4, we discuss
the inherent challenges of the field and how they complicate
the development of a benchmark. In Section 5, we describe
the existing benchmarks that relate to TinyML and identify
the deficiencies that still need to be filled. In Section 6 we
discuss the progress of the TinyMLPerf working group thus
far and describe the next steps. In Section 7, we concluded
the paper and discuss future work.
2 TINY USE CASES, MODELS & DATASETS
In this section we attempt to summarize the field of TinyML
by describing a set of representative use cases (Section
2.1), their relevant datasets (Section 2.2), and the model
architectures commonly applied to these specific use cases
(Section 2.3).
2.1 Use Cases
Despite the general lack of maturity within the field, there
are a number of well established TinyML use cases. We
categorize the application landscape of tiny ML by input
type in Table 3, which in the context of TinyML systems
plays a crucial role in the use case definition.
Audio wake words is already a fairly ubiquitous example of
always-on ML inference. Audio wake words is generally a
speech classification problem that achieves very low power
inference by limiting the label space, often to two labels:
“wake word” and “not wake word” (Zhang et al., 2017).
Anomaly detection and predictive maintenance are com-
monly deployed on MCUs in factory settings where audio,
motor bearing, or IMU data can be used to detect faults in
Benchmarking TinyML Systems
products or equipment.
Other deployed TinyML applications, like activity recog-
nition from IMU data (Hassan et al., 2018), rely on low
feature dimensionality to fit within the tight constraints of
the platforms. Some use cases have been proven viable, but
have yet to reach end users because they are too new, like
visual wake words (Chowdhery et al., 2019).
Many traditional ML use cases can be considered futuristic
TinyML tasks. As ultra-low-power inference hardware con-
tinues to improve, the threshold of viability expands. Tasks
like large label space image classification or object counting
are well suited for low-power always-on applications but
are currently too compute and memory hungry for today’s
TinyML hardware.
Furthermore, TinyML has a significant role to play in future
technology. For example, many of the fundamental features
of augmented reality (AR) glasses are always-on and battery-
powered. Due to tight real time constraints, these devices
cannot afford the latency of offloading computation to the
cloud, an edge server, or even an accompanying mobile
device. Thus, due to shared constraints, AR applications can
benefit significantly from progress in the field of TinyML.
2.2 Datasets
There are a number of open-source datasets that are relevant
to TinyML usecases. Table 3 breaks them down by the
type of data. Despite the availability of these datasets, the
majority of deployed TinyML models are trained on much
larger, proprietary datasets. The open-source datasets that
are competitively large are not TinyML specific. The lack
of large, TinyML focused, open-source datasets slows the
progress of academic research and limits the ability of a
benchmark to represent real workloads accurately.
2.3 Models
Table 3 lists common model types for TinyML use cases.
Although neural networks (NN) are a dominant force in
traditional ML, it is common to use non-NN based solutions
like decision trees (Kumar et al., 2017), for some TinyML
use cases, due to their low compute and memory require-
ments.
Machine learning on MCU-class devices has only recently
become feasible; therefore, the community has yet to pro-
duce models that have become widely accepted as Mo-
bileNets have become for mobile devices. This makes the
task of selecting representative models challenging. How-
ever, immaturity also brings opportunity as our decisions
can help direct future progress. Selecting a subset of the
currently available models, outlining the rules for quality
versus accuracy trade-offs, and prescribing a measurement
methodology that can be faithfully reproduced will encour-
Figure 1. A logorithmic comparison of the active power consump-
tion between TinyML systems and those supported by MLPerf.
TinyML systems can be up to four orders of magnitude smaller in
the power budget as compared to state-of-the-art MLPerf systems.
age the community to develop new models, runtimes, and
hardware that progressively outperform one another.
3 TINY HARDWARE CONSTRAINTS
TinyML hardware is defined by its ultra-low power con-
sumption, which is often in the range of 1 mWatt and below.
At the top of this range are efficient 32-bit MCUs, like those
based on the Arm Cortex-M7 or RISC-V PULP processors,
and at the bottom are novel ultra-low-power inference en-
gines. Even the largest TinyML devices consume drastically
less power than the smallest traditional ML devices. Figure
1 shows the logarithmic comparison of the active power
consumption between TinyML devices and those currently
supported by MLPerf (v0.5 inference results from the open
and closed divisions). TinyML devices can be up to four or-
ders of magnitude smaller in the power budget as compared
to state-of-the-art MLPerf systems.
The advent of low-power, cheap 32-bit MCUs have revolu-
tionized the compute capability at the very edge. Cortex-M
based platforms are now regularly performing tasks that
were previously infeasible at this scale, mostly due to sup-
port for single instruction multiple data (SIMD) and digital
signal processing (DSP) instructions. This fast vector math
supports NN and highly efficient SVM implementations, it
also accelerates many feature computations using 8bit fixed
point arithmetic.
A feature of MCUs is the prevalence of on-chip SRAM
and embedded Flash. Thus, when models can fit within the
tight on-chip memory constraints, they are free of the costly
DRAM accesses that hamper traditional ML. Widespread
adoption and dispersion of TinyML are reliant on the capa-
bility of these platforms.
Although general-purpose MCUs provide flexibility, the
Benchmarking TinyML Systems
highest TinyML performance efficiency comes from special-
ized hardware. Novel architectures can achieve performance
in the range of one micro Joule per inference (Holleman,
2019). These specialized devices expand the boundaries of
ML to the ultra low power end of TinyML processors.
4 CHALLENGES
TinyML systems present a number of unique challenges to
the design of a performance benchmark that can be used
to measure and quantify performance differences between
various systems systematically. We discuss the four primary
obstacles and postulate how they might be overcome.
4.1 Low Power
Low power consumption is one of the defining features of
TinyML systems. Therefore, a useful benchmark should
ostensibly profile the energy efficiency of each device. How-
ever, there are many challenges in fairly measuring energy
consumption. Firstly, as illustrated in Figure 1, TinyML
devices can consume drastically different amounts of power,
which makes maintaining accuracy across the range of de-
vices difficult.
Secondly, determining what falls under the scope of the
power measurement is difficult to determine when data paths
and pre-processing steps can vary significantly between
devices. Other factors like chip peripherals and underlying
firmware can impact the measurements. Unlike traditional
high-power ML systems, TinyML systems do not have spare
cores to load the System-Under-Test (SUT) with minimal
overheads.
4.2 Limited Memory
Due to their small size, TinyML systems often have tight
memory constraints. While traditional ML systems like
smartphones cope with resource constraints in the order
of a few GBs, tinyML systems are typically coping with
resources that are two orders of magnitude smaller.
Memory is one of the primary motivating factors for the
creation of a TinyML specific benchmark. Traditional
ML benchmarks use inference models that have drastically
higher peak memory requirements (in the order of gigabytes)
than TinyML devices can provide. This also complicates
the deployment of a benchmarking suite as any overhead
can significantly impact power consumption or even make
the benchmark too big to fit. Individual benchmarks must
also cover a wide range of devices; therefore, multiple levels
of quantization and precision should be represented in the
benchmarking suite. Finally, a variety of benchmarks should
be chosen such that the diversity of the field is supported.
4.3 Hardware Heterogeneity
Despite its nascency, TinyML systems are already diverse
in their performance, power, and capabilities. Devices range
from general-purpose MCUs to novel architectures, like
in event-based neural processors (Brainchip) or memory
compute (Kim et al., 2019). This heterogeneity poses a
number of challenges as the system under test (SUT) will
not necessarily include otherwise standard features, like
a system clock or debug interface. Furthermore, the task
of normalizing performance results across heterogeneous
implementations is a key challenge.
Today’s state-of-the-art benchmarks are not designed to han-
dle the challenges readily. They need careful re-engineering
to be flexible enough to handle the extent of hardware het-
erogeneity that is commonplace in the TinyML ecosystem.
4.4 Software Heterogeneity
There are three distinct methods for model deployment on
to TinyML systems: hand coding, code generation, and ML
interpreters.
Hand coding often produces the best results as it allows for
low-level, application specific optimizations; however, the
task is time consuming and the impact of the optimizations
are often opaque to anyone but the original design team.
Moreover, hand coding limits the ability to share knowledge
and adopt new methods, which is detrimental to the rate
of progress in TinyML. From a benchmarking perspective,
hand coded submission will likely produce the best numeri-
cal results at the cost of reproducibility, comparability and
time.
Code generation methods produce well optimized code with-
out the significant effort of hand coding by abstracting and
automating system level optimizations. However, code gen-
eration does not address the issues with comparability, as
each major vendor has their own set of proprietary tools and
compilers, which also makes portability a challenge.
ML interpreters allow for significant portability as their
abstract structure is the same across platforms. TensorFlow
Lite for Microcontrollers, a popular ML framework for
TinyML, uses an interpreter to call individual kernels, like
convolution, during run time. The framework is independent
of the model architecture, therefore new models can be
easily swapped in. Additionally, the reference kernels can
be individually optimised and changed to fit the platform.
This method comes with a small overhead in binary size
and performance. From a benchmarking perspective, this
abstraction separates the impact of the model architecture
on the system level performance, which makes results more
generalizable.
A benchmark suite must balance optimality with portabil-
Benchmarking TinyML Systems
Table 2. Existing Benchmarks
BENCHMARK ML? POWER? TINY?
COREMARK × √ √
MLMARK
√ × ×
MLPERF INFERENCE
√ √ ×
TINYML REQUIREMENTS
√ √ √
ity, and comparibility with representativeness. A TinyML
benchmark should support many options for model deploy-
ment but the impact of that choice on the results must be
carefully evaluated.
5 RELATED WORK
There are a number of ML related hardware benchmarks,
however, none that accurately represent the performance
of TinyML workloads on tiny hardware. Table 2 shows a
sampling of the widely accepted industry benchmarks that
are directly applicable to the discussion on TinyML systems.
EEMBC CoreMark (Gal-On & Levy) has become the stan-
dard performance benchmark for MCU-class devices due
to its ease of implementation and use of real algorithms.
Yet, CoreMark does not profile full programs, nor does it
accurately represent machine learning inference workloads.
EEMBC MLMark (Torelli & Bangale) addresses these is-
sues by using actual ML inference workloads. However, the
supported models are far too large for MCU-class devices
and are not representative of TinyML workloads. They re-
quire far too much memory (GBs) and have significant run
times. Additionally, while CoreMark supports power mea-
surements with ULPMark-CM (EEMBC), MLMark does
not, which is critical for a TinyML benchmark.
MLPerf, a community-driven benchmarking effort, has
recently introduced a benchmarking suite for ML infer-
ence (Reddi et al., 2019) and has plans to add power mea-
surements. However, much like MLMark, the current
MLPerf inference benchmark precludes MCUs and other
resource-constrained platforms due to a lack of small bench-
marks and compatible implementations.
As Table 2 summarizes, there is a clear and distinct need for
a TinyML benchmark that caters to the unique needs of ML
workloads, makes power a first-class citizen and prescribes
a methodology that suits TinyML.
6 DIRECTION
To overcome theses challenges, we adopt a set of principles
for the development of a robust TinyML benchmarking suite
and select a set of preliminary use cases.
6.1 Open and Closed Divisions
As previously stated, TinyML is a diverse field, therefore
not all systems can be accommodated under strict rules,
however, without strict rules, direct comparison of the hard-
ware becomes more difficult. To address this issue, we
adopt MLPerf’s open and closed structure. More traditional
TinyML solutions can submit to the closed division where
submissions must use a model that is considered equivalent
to the reference model. TinyML systems that fall outside
the bounds of the ”closed” benchmark can submit results to
the open division which will allow submissions to deviate
as necessary from the closed reference. We believe this
structure increases the inclusivity of the bechmarking suite
while maintaining the comparability of the results.
6.2 Preliminary Use Cases
The group has selected three preliminary use cases to target:
visual wake words, audio wake words, and anomaly detec-
tion. Visual wake words is a binary image classification
task that indicates if a person is visible in the image or not.
Audio wake words refers to the common, keyword spotting
task (e.g. “Alexa”, “Ok Google”, and “Hey Siri”). Anomaly
detection is a broader use case that classifies time series data
as “normal” or “abnormal”.
These use cases have been selected to represent the broad
range of TinyML. They encompass three distinct input data
types and range from relatively resource hungry (visual
wake words) to light weight (anomaly detection). Further-
more the models traditionally used for these use cases are
varied therefore the benchmarking suite can support a di-
verse set of ML techniques.
6.3 Dataset Selection
The Visual Wake Words Dataset (Chowdhery et al., 2019)
has been selected as the benchmark dataset for the visual
wake word use case. The Visual Wake Words Dataset is
derived from the COCO image classification dataset (Lin
et al., 2014). The datasets for the anomaly detection and
audio wake words use cases are still being selected.
6.4 Model Selection
The group has selected two of the three reference models.
The DS-CNN described in (Zhang et al., 2017) have been
selected for audio wake words and the Mobilenet (Howard
et al., 2017) used in the TensorFlow Lite for Microcon-
trollers person detection example (TFLM-Person-Detection)
has been selected for visual wake words.
Benchmarking TinyML Systems
Table 3. TinyMLPerf Benchmarking Suite
USE CASE DATASETS MODEL
AUDIO WAKE WORDS SPEECH COMMANDS (WARDEN, 2018A) DS-CNN (ZHANG ET AL., 2017)
VISUAL WAKE
WORDS
VISUAL WAKE WORDS DATASET
(CHOWDHERY ET AL., 2019) DS-CNN (TFLM-PERSON-DETECTION)
ANOMALY
DETECTION UNDECIDED UNDECIDED
6.5 Future work
Perfection is often the enemy of good, therefore, to fill
the community’s need for comparability, our priority is to
quickly establish a set of minimum viable benchmarks and
iteratively address deficiencies. The benchmarking suite
will continue to evolve to meet the needs of the community.
The next step is to select a representative datasets for the
anomaly detection and audio wake words use cases as well
as a reference model for the anomaly detection use case. We
plan to finish development and accept result submissions
before the end of 2020.
7 CONCLUSION
In conclusion, TinyML is an important and rapidly evolving
field that requires comparability amongst hardware inno-
vations to enable continued progress and stability. In this
paper, we reviewed the current landscape of TinyML, in-
cluding highlighting the need for a hardware benchmark.
Additionally, we analyzed challenges associated with devel-
oping said benchmark and discussed a path forward. We
hope this work can act as the call to action to establish
community-driven, fair, and useful TinyML benchmark.
If you would like to contribute to the effort, join the work-
ing group here: https://groups.google.com/
forum/#!forum/mlperf-tiny
REFERENCES
Helium: Enhancing the capabilities of the smallest de-
vices. URL https://www.arm.com/why-arm/
technologies/helium.
Altun, K., Barshan, B., and Tunel, O. Comparative study on
classifying human activities with miniature inertial and
magnetic sensors. Pattern Recognition, 43:3605–3620,
10 2010. doi: 10.1016/j.patcog.2010.04.019.
Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J.,
Nolfo, C. D., Nayak, T., Andreopoulos, A., Garreau,
G., Mendoza, M., Kusnitz, J., Debole, M., Esser, S.,
Delbruck, T., Flickner, M., and Modha, D. A low
power, fully event-based gesture recognition system. In
2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 7388–7397, July 2017. doi:
10.1109/CVPR.2017.781.
Brainchip. Akida neuromorphic sys-
tem on chip. URL https://www.
brainchipinc.com/products/
akida-neuromorphic-system-on-chip.
Chowdhery, A., Warden, P., Shlens, J., Howard, A.,
and Rhodes, R. Visual wake words dataset. CoRR,
abs/1906.05721, 2019. URL http://arxiv.org/
abs/1906.05721.
Cramariuc, A.-C. P. I. M. B. Precis har, 2019. URL http:
//dx.doi.org/10.21227/mene-ck48.
De Vito, S., Massera, E., Piga, M., and Martinotto, L. On
field calibration of an electronic nose for benzene estima-
tion in an urban pollution monitoring scenario. Sensors
and Actuators B Chemical, 129:750–757, 02 2008. doi:
10.1016/j.snb.2007.09.060.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. ImageNet: A Large-Scale Hierarchical Image
Database. In CVPR09, 2009.
EEMBC. Ulpmark - an eembc benchmark. URL https:
//www.eembc.org/ulpmark/index.php.
Fedorov, I., Adams, R. P., Mattina, M., and Whatmough, P.
Sparse: Sparse architecture search for cnns on resource-
constrained microcontrollers. In Advances in Neural In-
formation Processing Systems 32, pp. 4978–4990. Curran
Associates, Inc., 2019.
Benchmarking TinyML Systems
Flamand, E., Rossi, D., Conti, F., Loi, I., Pullini, A., Roten-
berg, F., and Benini, L. Gap-8: A risc-v soc for ai at
the edge of the iot. In 2018 IEEE 29th International
Conference on Application-specific Systems, Architec-
tures and Processors (ASAP), pp. 1–4, July 2018. doi:
10.1109/ASAP.2018.8445101.
Gal-On, S. and Levy, M. Exploring coremark - a bench-
mark maximizing simplicity and efficacy. Technical re-
port. URL https://www.eembc.org/techlit/
articles/coremark-whitepaper.pdf.
Garofalo, A., Rusci, M., Conti, F., Rossi, D., and Benini,
L. Pulp-nn: accelerating quantized neural networks
on parallel ultra-low-power risc-v processors. Philo-
sophical Transactions of the Royal Society A: Mathe-
matical, Physical and Engineering Sciences, 378(2164):
20190155, Dec 2019. ISSN 1471-2962. doi: 10.1098/rsta.
2019.0155. URL http://dx.doi.org/10.1098/
rsta.2019.0155.
Gemmeke, J. F., Ellis, D. P. W., Freedman, D., Jansen, A.,
Lawrence, W., Moore, R. C., Plakal, M., and Ritter, M.
Audio set: An ontology and human-labeled dataset for
audio events. In Proc. IEEE ICASSP 2017, New Orleans,
LA, 2017.
Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff,
J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody,
G. B., Peng, C.-K., and Stanley, H. E. PhysioBank, Phys-
ioToolkit, and PhysioNet: Components of a new research
resource for complex physiologic signals. Circulation,
101(23):e215–e220, 2000. Circulation Electronic Pages:
http://circ.ahajournals.org/content/101/23/e215.full
PMID:1085218; doi: 10.1161/01.CIR.101.23.e215.
Hassan, M. M., Uddin, M. Z., Mohamed, A., and Almogren,
A. A robust human activity recognition system using
smartphone sensors and deep learning. Future Generation
Computer Systems, 81:307–313, 2018.
Holleman, J. The speed and power advantage
of a purpose-built neural compute engine, Jun
2019. URL https://www.syntiant.com/post/
keyword-spotting-power-comparison.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang,
W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets:
Efficient convolutional neural networks for mobile vision
applications. arXiv preprint arXiv:1704.04861, 2017.
Kim, H., Chen, Q., Yoo, T., Kim, T. T.-H., and Kim, B. A
1-16b precision reconfigurable digital in-memory com-
puting macro featuring column-mac architecture and bit-
serial computation. In ESSCIRC 2019-IEEE 45th Eu-
ropean Solid State Circuits Conference (ESSCIRC), pp.
345–348. IEEE, 2019.
Krizhevsky, A., Nair, V., and Hinton, G. Cifar-10 (canadian
institute for advanced research). 2009. URL http:
//www.cs.toronto.edu/˜kriz/cifar.html.
Kumar, A., Goyal, S., and Varma, M. Resource-efficient
machine learning in 2 KB RAM for the internet of
things. In Precup, D. and Teh, Y. W. (eds.), Pro-
ceedings of the 34th International Conference on Ma-
chine Learning, volume 70 of Proceedings of Ma-
chine Learning Research, pp. 1935–1944, International
Convention Centre, Sydney, Australia, 06–11 Aug
2017. PMLR. URL http://proceedings.mlr.
press/v70/kumar17a.html.
Lai, L., Suda, N., and Chandra, V. Cmsis-nn: Efficient
neural network kernels for arm cortex-m cpus, 2018.
LeCun, Y. and Cortes, C. MNIST handwritten digit
database. 2010. URL http://yann.lecun.com/
exdb/mnist/.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-
manan, D., Dolla´r, P., and Zitnick, C. L. Microsoft coco:
Common objects in context. In European conference on
computer vision, pp. 740–755. Springer, 2014.
Lobov, S., Krilova, N., Kastalskiy, I., Kazantsev, V., and
Makarov, V. Latent factors limiting the performance
of semg-interfaces. Sensors, 18:1122, 04 2018. doi:
10.3390/s18041122.
Moons, B., Bankman, D., Yang, L., Murmann, B., and
Verhelst, M. Binareye: An always-on energy-accuracy-
scalable binary cnn processor with all memory on chip
in 28nm cmos. In 2018 IEEE Custom Integrated Circuits
Conference (CICC), pp. 1–4. IEEE, 2018.
Reddi, V. J., Cheng, C., Kanter, D., Mattson, P.,
Schmuelling, G., Wu, C.-J., Anderson, B., Breughe, M.,
Charlebois, M., Chou, W., Chukka, R., Coleman, C.,
Davis, S., Deng, P., Diamos, G., Duke, J., Fick, D., Gard-
ner, J. S., Hubara, I., Idgunji, S., Jablin, T. B., Jiao, J.,
John, T. S., Kanwar, P., Lee, D., Liao, J., Lokhmotov,
A., Massa, F., Meng, P., Micikevicius, P., Osborne, C.,
Pekhimenko, G., Rajan, A. T. R., Sequeira, D., Sirasao,
A., Sun, F., Tang, H., Thomson, M., Wei, F., Wu, E., Xu,
L., Yamada, K., Yu, B., Yuan, G., Zhong, A., Zhang, P.,
and Zhou, Y. Mlperf inference benchmark, 2019.
Roggen, D., Calatroni, A., Rossi, M., Holleczek, T., Frster,
K., Trster, G., Lukowicz, P., Bannach, D., Pirkl, G., Fer-
scha, A., Doppler, J., Holzmann, C., Kurz, M., Holl,
G., Chavarriaga, R., Sagha, H., Bayati, H., Creatura,
M., and d. R. Milln, J. Collecting complex activity
datasets in highly rich networked sensor environments.
In 2010 Seventh International Conference on Networked
Sensing Systems (INSS), pp. 233–240, June 2010. doi:
10.1109/INSS.2010.5573462.
Benchmarking TinyML Systems
Saxena, A. and Goebel, K. Turbofan engine
degradation simulation data set, 2008. URL
http://ti.arc.nasa.gov/project/
prognostic-data-repository.
TensorFlow. Tensorflow lite for microcontrollers.
URL https://www.tensorflow.org/lite/
microcontrollers.
TFLM-Person-Detection. Tensorflow lite for mi-
crocontrollers person detection example. URL
https://github.com/tensorflow/
tensorflow/tree/master/tensorflow/
lite/micro/examples/person_detection.
tinyML Foundation. tinyml summit, 2019. URL https:
//www.tinymlsummit.org/.
Torelli, P. and Bangale, M. Measuring inference perfor-
mance of machine-learning frameworks on edge-class
devices with the mlmark benchmark. URL https:
//www.eembc.org/techlit/articles/
MLMARK-WHITEPAPER-FINAL-1.pdf.
Vaizman, Y., Ellis, K., and Lanckriet, G. Recognizing
detailed human context in the wild from smartphones and
smartwatches. IEEE Pervasive Computing, 16(4):62–74,
October 2017. ISSN 1558-2590. doi: 10.1109/MPRV.
2017.3971131.
Vergara, A., Vembu, S., Ayhan, T., Ryan, M., Homer, M.,
and Huerta, R. Chemical gas sensor drift compensation
using classifier ensembles. Sensors and Actuators B:
Chemical, s 166167:320329, 05 2012. doi: 10.1016/j.snb.
2012.01.074.
Wang, K., Liu, Z., Lin, Y., Lin, J., and Han, S. Haq:
Hardware-aware automated quantization with mixed pre-
cision. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 8612–8620,
2019.
Warden, P. Speech commands: A dataset for limited-
vocabulary speech recognition, 2018a.
Warden, P. why the future of machine learning is tiny, 2018b.
URL https://petewarden.com/2018/06/11/
why-the-future-of-machine-learning-is-tiny/.
Zhang, Y., Suda, N., Lai, L., and Chandra, V. Hello edge:
Keyword spotting on microcontrollers, 2017.
