Search CORE

21 research outputs found

Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

Author: Rusci Manuele
Tuytelaars Tinne
Publication venue
Publication date: 03/06/2023
Field of study

A personalized KeyWord Spotting (KWS) pipeline typically requires the training of a Deep Learning model on a large set of user-defined speech utterances, preventing fast customization directly applied on-device. To fill this gap, this paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier. With user-defined keywords from 10 classes of the Google Speech Command dataset, our study reports an accuracy of up to 76% in a 10-shot scenario while the false acceptance rate of unknown data is kept to 5%. In the analyzed settings, the usage of the triplet loss to train an encoder with normalized output features performs better than the prototypical networks jointly trained with a generator of dummy unknown-class prototypes. This design is also more effective than encoders trained on a classification problem and features fewer parameters than other iso-accuracy approaches.Comment: Accepted at INTERSPEECH 202

arXiv.org e-Print Archive

A sub-mW IoT-endnode for always-on visual monitoring and smart triggering

Author: Benini Luca
Farella Elisabetta
Rossi Davide
Rusci Manuele
Publication venue
Publication date: 01/01/2017
Field of study

This work presents a fully-programmable Internet of Things (IoT) visual sensing node that targets sub-mW power consumption in always-on monitoring scenarios. The system features a spatial-contrast

128\mathrm{x}64

binary pixel imager with focal-plane processing. The sensor, when working at its lowest power mode (

10\mu W

at 10 fps), provides as output the number of changed pixels. Based on this information, a dedicated camera interface, implemented on a low-power FPGA, wakes up an ultra-low-power parallel processing unit to extract context-aware visual information. We evaluate the smart sensor on three always-on visual triggering application scenarios. Triggering accuracy comparable to RGB image sensors is achieved at nominal lighting conditions, while consuming an average power between

193\mu W

and

277\mu W

, depending on context activity. The digital sub-system is extremely flexible, thanks to a fully-programmable digital signal processing engine, but still achieves 19x lower power consumption compared to MCU-based cameras with significantly lower on-board computing capabilities.Comment: 11 pages, 9 figures, submitteted to IEEE IoT Journa

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Ultra-Low Power IoT Smart Visual Sensing Devices for Always-ON Applications

Author: Rusci Manuele <1989>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 27/04/2018
Field of study

This work presents the design of a Smart Ultra-Low Power visual sensor architecture that couples together an ultra-low power event-based image sensor with a parallel and power-optimized digital architecture for data processing. By means of mixed-signal circuits, the imager generates a stream of address events after the extraction and binarization of spatial gradients. When targeting monitoring applications, the sensing and processing energy costs can be reduced by two orders of magnitude thanks to either the mixed-signal imaging technology, the event-based data compression and the use of event-driven computing approaches. From a system-level point of view, a context-aware power management scheme is enabled by means of a power-optimized sensor peripheral block, that requests the processor activation only when a relevant information is detected within the focal plane of the imager. When targeting a smart visual node for triggering purpose, the event-driven approach brings a 10x power reduction with respect to other presented visual systems, while leading to comparable results in terms of detection accuracy. To further enhance the recognition capabilities of the smart camera system, this work introduces the concept of event-based binarized neural networks. By coupling together the theory of binarized neural networks and focal-plane processing, a 17.8% energy reduction is demonstrated on a real-world data classification with a performance drop of 3% with respect to a baseline system featuring commercial visual sensors and a Binary Neural Network engine. Moreover, if coupling the BNN engine with the event-driven triggering detection flow, the average power consumption can be as low as the sleep power of 0.3mW in case of infrequent events, which is 8x lower than a smart camera system featuring a commercial RGB imager

AMS Tesi di Dottorato

Reduced precision floating-point optimization for Deep Neural Network On-Device Learning on microcontrollers

Author: Davide Nadalini
Francesco Conti
Luca Benini
Manuele Rusci
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications. This paper tackles this challenge by introducing a novel reduced precision optimization technique for ODL primitives on MCU-class devices, leveraging the State-of-Art advancements in RISC-V RV32 architectures with support for vectorized 16-bit floating-point (FP16) Single-Instruction Multiple-Data (SIMD) operations. Our approach for the Forward and Backward steps of the Back Propagation training algorithm is composed of specialized shape transform operators and Matrix Multiplication (MM) kernels, accelerated with parallelization and loop unrolling. When evaluated on a single training step of a 2D Convolution layer, the SIMD-optimized FP16 primitives result up to 1.72x faster than the FP32 baseline on a RISC-V-based 8+1-core MCU. An average computing efficiency of 3.11 Multiply and Accumulate operations per clock cycle (MAC/clk) and 0.81 MAC/clk is measured for the end-to-end training tasks of a ResNet8 and a DS-CNN for Image Classification and Keyword Spotting, respectively - requiring 17.1 ms and 6.4 ms on the target platform to compute a training step on a single sample. Overall, our approach results more than two orders of magnitude faster than existing ODL software frameworks for single-core MCUs and outperforms by 1.6x previous FP32 parallel implementations on a Continual Learning setup.& COPY; 2023 Elsevier B.V. All rights reserved

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers

Author: Benini Luca
Conti Francesco
Nadalini Davide
Rusci Manuele
Publication venue
Publication date: 30/05/2023
Field of study

\times

faster than the FP32 baseline on a RISC-V-based 8+1-core MCU. An average computing efficiency of 3.11 Multiply and Accumulate operations per clock cycle (MAC/clk) and 0.81 MAC/clk is measured for the end-to-end training tasks of a ResNet8 and a DS-CNN for Image Classification and Keyword Spotting, respectively -- requiring 17.1 ms and 6.4 ms on the target platform to compute a training step on a single sample. Overall, our approach results more than two orders of magnitude faster than existing ODL software frameworks for single-core MCUs and outperforms by 1.6

\times

previous FP32 parallel implementations on a Continual Learning setup.Comment: Pre-print version submitted to Elsevier's Future Generation Computer Systems journal. For the associated open-source release, see https://github.com/pulp-platform/pulp-trainli

arXiv.org e-Print Archive