Ultra-low-resolution Infrared (IR) array sensors offer a low-cost,
energy-efficient, and privacy-preserving solution for people counting, with
applications such as occupancy monitoring. Previous work has shown that Deep
Learning (DL) can yield superior performance on this task. However, the
literature was missing an extensive comparative analysis of various efficient
DL architectures for IR array-based people counting, that considers not only
their accuracy, but also the cost of deploying them on memory- and
energy-constrained Internet of Things (IoT) edge nodes. In this work, we
address this need by comparing 6 different DL architectures on a novel dataset
composed of IR images collected from a commercial 8x8 array, which we made
openly available. With a wide architectural exploration of each model type, we
obtain a rich set of Pareto-optimal solutions, spanning cross-validated
balanced accuracy scores in the 55.70-82.70% range. When deployed on a
commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these
models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference,
while consuming 17.18-120.43 μJ of energy. Our models are significantly
more accurate than a previous deterministic method (up to +39.9%), while being
up to 3.53x faster and more energy efficient. Further, our models' accuracy is
comparable to state-of-the-art DL solutions on similar resolution sensors,
despite a much lower complexity. All our models enable continuous, real-time
inference on a MCU-based IoT node, with years of autonomous operation without
battery recharging.Comment: This article has been accepted for publication in IEEE Internet of
Things Journa