251 research outputs found
Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms
Outdoor acoustic events detection is an exciting research field but
challenged by the need for complex algorithms and deep learning techniques,
typically requiring many computational, memory, and energy resources. This
challenge discourages IoT implementation, where an efficient use of resources
is required. However, current embedded technologies and microcontrollers have
increased their capabilities without penalizing energy efficiency. This paper
addresses the application of sound event detection at the edge, by optimizing
deep learning techniques on resource-constrained embedded platforms for the
IoT. The contribution is two-fold: firstly, a two-stage student-teacher
approach is presented to make state-of-the-art neural networks for sound event
detection fit on current microcontrollers; secondly, we test our approach on an
ARM Cortex M4, particularly focusing on issues related to 8-bits quantization.
Our embedded implementation can achieve 68% accuracy in recognition on
Urbansound8k, not far from state-of-the-art performance, with an inference time
of 125 ms for each second of the audio stream, and power consumption of 5.5 mW
in just 34.3 kB of RAM
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices
Sound event detection (SED) is a hot topic in consumer and smart city
applications. Existing approaches based on Deep Neural Networks are very
effective, but highly demanding in terms of memory, power, and throughput when
targeting ultra-low power always-on devices.
Latency, availability, cost, and privacy requirements are pushing recent IoT
systems to process the data on the node, close to the sensor, with a very
limited energy supply, and tight constraints on the memory size and processing
capabilities precluding to run state-of-the-art DNNs.
In this paper, we explore the combination of extreme quantization to a
small-footprint binary neural network (BNN) with the highly energy-efficient,
RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for
SED whose footprint (815 kB) exceeds the 512 kB of memory available on our
platform, we retrain the network using binary filters and activations to match
these memory constraints. (Fully) binary neural networks come with a natural
drop in accuracy of 12-18% on the challenging ImageNet object recognition
challenge compared to their equivalent full-precision baselines. This BNN
reaches a 77.9% accuracy, just 7% lower than the full-precision version, with
58 kB (7.2 times less) for the weights and 262 kB (2.4 times less) memory in
total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s
and 1.5 GMAC/s over the full network, including preprocessing with Mel bins,
which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W,
respectively. Compared to the performance of an ARM Cortex-M4 implementation,
our system has a 10.3 times faster execution time and a 51.1 times higher
energy-efficiency.Comment: 6 pages conferenc
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
We propose a method named AudioFormer,which learns audio feature
representations through the acquisition of discrete acoustic codes and
subsequently fine-tunes them for audio classification tasks. Initially,we
introduce a novel perspective by considering the audio classification task as a
form of natural language understanding (NLU). Leveraging an existing neural
audio codec model,we generate discrete acoustic codes and utilize them to train
a masked language model (MLM),thereby obtaining audio feature representations.
Furthermore,we pioneer the integration of a Multi-Positive sample Contrastive
(MPC) learning approach. This method enables the learning of joint
representations among multiple discrete acoustic codes within the same audio
input. In our experiments,we treat discrete acoustic codes as textual data and
train a masked language model using a cloze-like methodology,ultimately
deriving high-quality audio representations. Notably,the MPC learning technique
effectively captures collaborative representations among distinct positive
samples. Our research outcomes demonstrate that AudioFormer attains
significantly improved performance compared to prevailing monomodal audio
classification models across multiple datasets,and even outperforms
audio-visual multimodal classification models on select datasets.
Specifically,our approach achieves remarkable results on datasets including
AudioSet (2M,20K),and FSD50K,with performance scores of 53.9,45.1,and
65.6,respectively. We have openly shared both the code and models:
https://github.com/LZH-0225/AudioFormer.git.Comment: 9 pages, 4 figure
Near Sensor Artificial Intelligence on IoT Devices for Smart Cities
The IoT is in a continuous evolution thanks to new technologies that open
the doors to various applications. While the structure of the IoT network remains the same
over the years, specifically composed of a server, gateways, and nodes, their tasks change
according to new challenges: the use of multimedia information and the large amount of data
created by millions of devices forces the system to move from the cloud-centric approach to the thing-centric approach, where
nodes partially process the information. Computing at the sensor node level solves
well-known problems like scalability and privacy concerns. However, this study’s primary
focus is on the impact that bringing the computation at the edge has on energy:
continuous transmission of multimedia data drains the battery, and processing information
on the node reduces the amount of data transferred to event-based alerts. Nevertheless, most
of the foundational services for IoT applications are provided by AI. Due
to this class of algorithms’ complexity, they are always delegated to GPUs or devices with
an energy budget that is orders of magnitude more than an IoT node, which should
be energy-neutral and powered only by energy harvesters. Enabling AI on IoT nodes
is a challenging task. From the software side,
this work explores the most recent compression techniques for NN,
enabling the reduction of state-of-the-art networks to make them fit in microcontroller systems. From the hardware side, this thesis focuses on hardware selection. It compares the AI algorithms’ efficiency running on both well-established microcontrollers and state-of-the-art processors. An additional contribution towards energy-efficient AI is the exploration of hardware for acquisition and pre-processing of sound data, analyzing the data’s quality for further classification. Moreover, the combination of software and
hardware co-design is the key point of this thesis to bring AI to the very edge of the IoT network
Teacher-Student Architecture for Knowledge Distillation: A Survey
Although Deep neural networks (DNNs) have shown a strong capacity to solve
large-scale problems in many areas, such DNNs are hard to be deployed in
real-world systems due to their voluminous parameters. To tackle this issue,
Teacher-Student architectures were proposed, where simple student networks with
a few parameters can achieve comparable performance to deep teacher networks
with many parameters. Recently, Teacher-Student architectures have been
effectively and widely embraced on various knowledge distillation (KD)
objectives, including knowledge compression, knowledge expansion, knowledge
adaptation, and knowledge enhancement. With the help of Teacher-Student
architectures, current studies are able to achieve multiple distillation
objectives through lightweight and generalized student networks. Different from
existing KD surveys that primarily focus on knowledge compression, this survey
first explores Teacher-Student architectures across multiple distillation
objectives. This survey presents an introduction to various knowledge
representations and their corresponding optimization objectives. Additionally,
we provide a systematic overview of Teacher-Student architectures with
representative learning algorithms and effective distillation schemes. This
survey also summarizes recent applications of Teacher-Student architectures
across multiple purposes, including classification, recognition, generation,
ranking, and regression. Lastly, potential research directions in KD are
investigated, focusing on architecture design, knowledge quality, and
theoretical studies of regression-based learning, respectively. Through this
comprehensive survey, industry practitioners and the academic community can
gain valuable insights and guidelines for effectively designing, learning, and
applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with
arXiv:2210.1733
- …