356 research outputs found
Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms
Outdoor acoustic events detection is an exciting research field but
challenged by the need for complex algorithms and deep learning techniques,
typically requiring many computational, memory, and energy resources. This
challenge discourages IoT implementation, where an efficient use of resources
is required. However, current embedded technologies and microcontrollers have
increased their capabilities without penalizing energy efficiency. This paper
addresses the application of sound event detection at the edge, by optimizing
deep learning techniques on resource-constrained embedded platforms for the
IoT. The contribution is two-fold: firstly, a two-stage student-teacher
approach is presented to make state-of-the-art neural networks for sound event
detection fit on current microcontrollers; secondly, we test our approach on an
ARM Cortex M4, particularly focusing on issues related to 8-bits quantization.
Our embedded implementation can achieve 68% accuracy in recognition on
Urbansound8k, not far from state-of-the-art performance, with an inference time
of 125 ms for each second of the audio stream, and power consumption of 5.5 mW
in just 34.3 kB of RAM
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices
Sound event detection (SED) is a hot topic in consumer and smart city
applications. Existing approaches based on Deep Neural Networks are very
effective, but highly demanding in terms of memory, power, and throughput when
targeting ultra-low power always-on devices.
Latency, availability, cost, and privacy requirements are pushing recent IoT
systems to process the data on the node, close to the sensor, with a very
limited energy supply, and tight constraints on the memory size and processing
capabilities precluding to run state-of-the-art DNNs.
In this paper, we explore the combination of extreme quantization to a
small-footprint binary neural network (BNN) with the highly energy-efficient,
RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for
SED whose footprint (815 kB) exceeds the 512 kB of memory available on our
platform, we retrain the network using binary filters and activations to match
these memory constraints. (Fully) binary neural networks come with a natural
drop in accuracy of 12-18% on the challenging ImageNet object recognition
challenge compared to their equivalent full-precision baselines. This BNN
reaches a 77.9% accuracy, just 7% lower than the full-precision version, with
58 kB (7.2 times less) for the weights and 262 kB (2.4 times less) memory in
total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s
and 1.5 GMAC/s over the full network, including preprocessing with Mel bins,
which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W,
respectively. Compared to the performance of an ARM Cortex-M4 implementation,
our system has a 10.3 times faster execution time and a 51.1 times higher
energy-efficiency.Comment: 6 pages conferenc
Near Sensor Artificial Intelligence on IoT Devices for Smart Cities
The IoT is in a continuous evolution thanks to new technologies that open
the doors to various applications. While the structure of the IoT network remains the same
over the years, specifically composed of a server, gateways, and nodes, their tasks change
according to new challenges: the use of multimedia information and the large amount of data
created by millions of devices forces the system to move from the cloud-centric approach to the thing-centric approach, where
nodes partially process the information. Computing at the sensor node level solves
well-known problems like scalability and privacy concerns. However, this study’s primary
focus is on the impact that bringing the computation at the edge has on energy:
continuous transmission of multimedia data drains the battery, and processing information
on the node reduces the amount of data transferred to event-based alerts. Nevertheless, most
of the foundational services for IoT applications are provided by AI. Due
to this class of algorithms’ complexity, they are always delegated to GPUs or devices with
an energy budget that is orders of magnitude more than an IoT node, which should
be energy-neutral and powered only by energy harvesters. Enabling AI on IoT nodes
is a challenging task. From the software side,
this work explores the most recent compression techniques for NN,
enabling the reduction of state-of-the-art networks to make them fit in microcontroller systems. From the hardware side, this thesis focuses on hardware selection. It compares the AI algorithms’ efficiency running on both well-established microcontrollers and state-of-the-art processors. An additional contribution towards energy-efficient AI is the exploration of hardware for acquisition and pre-processing of sound data, analyzing the data’s quality for further classification. Moreover, the combination of software and
hardware co-design is the key point of this thesis to bring AI to the very edge of the IoT network
Decentralized Federated Learning for Epileptic Seizures Detection in Low-Power Wearable Systems
In healthcare, data privacy of patients regulations prohibits data from being moved outside the hospital, preventing international medical datasets from being centralized for AI training. Federated learning (FL) is a data privacy-focused method that trains a global model by aggregating local models from hospitals. Existing FL techniques adopt a central server-based network topology, where the server assembles the local models trained in each hospital to create a global model. However, the server could be a point of failure, and models trained in FL usually have worse performance than those trained in the centralized learning manner when the patient's data are not independent and identically distributed (Non-IID) in the hospitals. This paper presents a decentralized FL framework, including training with adaptive ensemble learning and a deployment phase using knowledge distillation. The adaptive ensemble learning step in the training phase leads to the acquisition of a specific model for each hospital that is the optimal combination of local models and models from other available hospitals. This step solves the non-IID challenges in each hospital. The deployment phase adjusts the model's complexity to meet the resource constraints of wearable systems. We evaluated the performance of our approach on edge computing platforms using EPILEPSIAE and TUSZ databases, which are public epilepsy datasets.RYC2021-032853-
Towards Green Metaverse Networking Technologies, Advancements and Future Directions
As the Metaverse is iteratively being defined, its potential to unleash the
next wave of digital disruption and create real-life value becomes increasingly
clear. With distinctive features of immersive experience, simultaneous
interactivity, and user agency, the Metaverse has the capability to transform
all walks of life. However, the enabling technologies of the Metaverse, i.e.,
digital twin, artificial intelligence, blockchain, and extended reality, are
known to be energy-hungry, therefore raising concerns about the sustainability
of its large-scale deployment and development. This article proposes Green
Metaverse Networking for the first time to optimize energy efficiencies of all
network components for Metaverse sustainable development. We first analyze
energy consumption, efficiency, and sustainability of energy-intensive
technologies in the Metaverse. Next, focusing on computation and networking, we
present major advancements related to energy efficiency and their integration
into the Metaverse. A case study of energy conservation by incorporating
semantic communication and stochastic resource allocation in the Metaverse is
presented. Finally, we outline the critical challenges of Metaverse sustainable
development, thereby indicating potential directions of future research towards
the green Metaverse
Machine Learning for Microcontroller-Class Hardware -- A Review
The advancements in machine learning opened a new opportunity to bring
intelligence to the low-end Internet-of-Things nodes such as microcontrollers.
Conventional machine learning deployment has high memory and compute footprint
hindering their direct deployment on ultra resource-constrained
microcontrollers. This paper highlights the unique requirements of enabling
onboard machine learning for microcontroller class devices. Researchers use a
specialized model development workflow for resource-limited applications to
ensure the compute and latency budget is within the device limits while still
maintaining the desired performance. We characterize a closed-loop widely
applicable workflow of machine learning model development for microcontroller
class devices and show that several classes of applications adopt a specific
instance of it. We present both qualitative and numerical insights into
different stages of model development by showcasing several use cases. Finally,
we identify the open research challenges and unsolved questions demanding
careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review
The field of Tiny Machine Learning (TinyML) has gained significant attention
due to its potential to enable intelligent applications on resource-constrained
devices. This review provides an in-depth analysis of the advancements in
efficient neural networks and the deployment of deep learning models on
ultra-low power microcontrollers (MCUs) for TinyML applications. It begins by
introducing neural networks and discussing their architectures and resource
requirements. It then explores MEMS-based applications on ultra-low power MCUs,
highlighting their potential for enabling TinyML on resource-constrained
devices. The core of the review centres on efficient neural networks for
TinyML. It covers techniques such as model compression, quantization, and
low-rank factorization, which optimize neural network architectures for minimal
resource utilization on MCUs. The paper then delves into the deployment of deep
learning models on ultra-low power MCUs, addressing challenges such as limited
computational capabilities and memory resources. Techniques like model pruning,
hardware acceleration, and algorithm-architecture co-design are discussed as
strategies to enable efficient deployment. Lastly, the review provides an
overview of current limitations in the field, including the trade-off between
model complexity and resource constraints. Overall, this review paper presents
a comprehensive analysis of efficient neural networks and deployment strategies
for TinyML on ultra-low-power MCUs. It identifies future research directions
for unlocking the full potential of TinyML applications on resource-constrained
devices.Comment: 39 pages, 9 figures, 5 table
OWSNet: Towards Real-time Offensive Words Spotting Network for Consumer IoT Devices
Every modern household owns at least a dozen of IoT devices like smart speakers, video doorbells, smartwatches, where most of them are equipped with a Keyword spotting(KWS) system-based digital voice assistant like Alexa. The state-of-the-art KWS systems require a large number of operations, higher computation, memory resources to show top performance. In this paper, in contrast to existing resource-demanding KWS systems, we propose a light-weight temporal convolution based KWS system named OWSNet, that can comfortably execute on a variety of IoT devices around us and can accurately spot multiple keywords in real-time without disturbing the device\u27s routine functionalities.
When OWSNet is deployed on consumer IoT devices placed in the workplace, home, etc., in addition to spotting wake/trigger words like `Hey Siri\u27, `Alexa\u27, it can also accurately spot offensive words in real-time. If regular wake words are spotted, it activates the voice assistant; else if offensive words are spotted, it starts to capture and stream audio data to speech analytics APIs for autonomous threat and insecurities detection in the scene. The evaluation results show that the OWSNet is faster than state-of-the-art models as it produced ~ 1-74 times faster inference on Raspberry Pi 4 and ~ 1-12 times faster inference on NVIDIA Jetson Nano. In this paper, to optimize IoT use-case models like OWSNet, we present a generic multi-component ML model optimization sequence that can reduce the memory and computation demands of a wide range of ML models thus enabling their execution on low resource, cost, power IoT devices
- …