25 research outputs found

    PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform

    Full text link
    Computing with high-dimensional (HD) vectors, also referred to as hypervectors\textit{hypervectors}, is a brain-inspired alternative to computing with scalars. Key properties of HD computing include a well-defined set of arithmetic operations on hypervectors, generality, scalability, robustness, fast learning, and ubiquitous parallel operations. HD computing is about manipulating and comparing large patterns-binary hypervectors with 10,000 dimensions-making its efficient realization on minimalistic ultra-low-power platforms challenging. This paper describes HD computing's acceleration and its optimization of memory accesses and operations on a silicon prototype of the PULPv3 4-core platform (1.5mm2^2, 2mW), surpassing the state-of-the-art classification accuracy (on average 92.4%) with simultaneous 3.7×\times end-to-end speed-up and 2×\times energy saving compared to its single-core execution. We further explore the scalability of our accelerator by increasing the number of inputs and classification window on a new generation of the PULP architecture featuring bit-manipulation instruction extensions and larger number of 8 cores. These together enable a near ideal speed-up of 18.4×\times compared to the single-core PULPv3

    An Energy-Efficient IoT node for HMI applications based on an ultra-low power Multicore Processor

    Get PDF
    Developing wearable sensing technologies and unobtrusive devices is paving the way to the design of compelling applications for the next generation of systems for a smart IoT node for Human Machine Interaction (HMI). In this paper we present a smart sensor node for IoT and HMI based on a programmable Parallel Ultra-Low-Power (PULP) platform. We tested the system on a hand gesture recognition application, which is a preferred way of interaction in HMI design. A wearable armband with 8 EMG sensors is controlled by our IoT node, running a machine learning algorithm in real-time, recognizing up to 11 gestures with a power envelope of 11.84 mW. As a result, the proposed approach is capable to 35 hours of continuous operation and 1000 hours in standby. The resulting platform minimizes effectively the power required to run the software application and thus, it allows more power budget for high-quality AFE

    Hardware optimizations of dense binary hyperdimensional computing: Rematerialization of hypervectors, binarized bundling, and combinational associative memory

    Get PDF
    Brain-inspired hyperdimensional (HD) computing models neural activity patterns of the very size of the brain's circuits with points of a hyperdimensional space, that is, with hypervectors. Hypervectors are Ddimensional (pseudo)random vectors with independent and identically distributed (i.i.d.) components constituting ultra-wide holographic words: D = 10,000 bits, for instance. At its very core, HD computing manipulates a set of seed hypervectors to build composite hypervectors representing objects of interest. It demands memory optimizations with simple operations for an efficient hardware realization. In this article, we propose hardware techniques for optimizations of HD computing, in a synthesizable open-source VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx UltraScale FPGAs: (1)We propose simple logical operations to rematerialize the hypervectors on the fly rather than loading them from memory. These operations massively reduce the memory footprint by directly computing the composite hypervectors whose individual seed hypervectors do not need to be stored in memory. (2) Bundling a series of hypervectors over time requires a multibit counter per every hypervector component. We instead propose a binarized back-to-back bundling without requiring any counters. This truly enables onchip learning with minimal resources as every hypervector component remains binary over the course of training to avoid otherwise multibit components. (3) For every classification event, an associative memory is in charge of finding the closest match between a set of learned hypervectors and a query hypervector by using a distance metric. This operator is proportional to hypervector dimension (D), and hence may take O(D) cycles per classification event. Accordingly, we significantly improve the throughput of classification by proposing associative memories that steadily reduce the latency of classification to the extreme of a single cycle. (4) We perform a design space exploration incorporating the proposed techniques on FPGAs for a wearable biosignal processing application as a case study. Our techniques achieve up to 2.39 7 area saving, or 2,337 7 throughput improvement. The Pareto optimal HD architecture is mapped on only 18,340 configurable logic blocks (CLBs) to learn and classify five hand gestures using four electromyography sensors

    Optimized Biosignals Processing Algorithms for New Designs of Human Machine Interfaces on Parallel Ultra-Low Power Architectures

    Get PDF
    The aim of this dissertation is to explore Human Machine Interfaces (HMIs) in a variety of biomedical scenarios. The research addresses typical challenges in wearable and implantable devices for diagnostic, monitoring, and prosthetic purposes, suggesting a methodology for tailoring such applications to cutting edge embedded architectures. The main challenge is the enhancement of high-level applications, also introducing Machine Learning (ML) algorithms, using parallel programming and specialized hardware to improve the performance. The majority of these algorithms are computationally intensive, posing significant challenges for the deployment on embedded devices, which have several limitations in term of memory size, maximum operative frequency, and battery duration. The proposed solutions take advantage of a Parallel Ultra-Low Power (PULP) architecture, enhancing the elaboration on specific target architectures, heavily optimizing the execution, exploiting software and hardware resources. The thesis starts by describing a methodology that can be considered a guideline to efficiently implement algorithms on embedded architectures. This is followed by several case studies in the biomedical field, starting with the analysis of a Hand Gesture Recognition, based on the Hyperdimensional Computing algorithm, which allows performing a fast on-chip re-training, and a comparison with the state-of-the-art Support Vector Machine (SVM); then a Brain Machine Interface (BCI) to detect the respond of the brain to a visual stimulus follows in the manuscript. Furthermore, a seizure detection application is also presented, exploring different solutions for the dimensionality reduction of the input signals. The last part is dedicated to an exploration of typical modules for the development of optimized ECG-based applications

    Efficient Personalized Learning for Wearable Health Applications using HyperDimensional Computing

    Full text link
    Health monitoring applications increasingly rely on machine learning techniques to learn end-user physiological and behavioral patterns in everyday settings. Considering the significant role of wearable devices in monitoring human body parameters, on-device learning can be utilized to build personalized models for behavioral and physiological patterns, and provide data privacy for users at the same time. However, resource constraints on most of these wearable devices prevent the ability to perform online learning on them. To address this issue, it is required to rethink the machine learning models from the algorithmic perspective to be suitable to run on wearable devices. Hyperdimensional computing (HDC) offers a well-suited on-device learning solution for resource-constrained devices and provides support for privacy-preserving personalization. Our HDC-based method offers flexibility, high efficiency, resilience, and performance while enabling on-device personalization and privacy protection. We evaluate the efficacy of our approach using three case studies and show that our system improves the energy efficiency of training by up to 45.8×45.8\times compared with the state-of-the-art Deep Neural Network (DNN) algorithms while offering a comparable accuracy

    Low-Power Human-Machine Interfaces: Analysis And Design

    Get PDF
    Human-Machine Interaction (HMI) systems, once used for clinical applications, have recently reached a broader set of scenarios, such as industrial, gaming, learning, and health tracking thanks to advancements in Digital Signal Processing (DSP) and Machine Learning (ML) techniques. A growing trend is to integrate computational capabilities into wearable devices to reduce power consumption associated with wireless data transfer while providing a natural and unobtrusive way of interaction. However, current platforms can barely cope with the computational complexity introduced by the required feature extraction and classification algorithms without compromising the battery life and the overall intrusiveness of the system. Thus, highly-wearable and real-time HMIs are yet to be introduced. Designing and implementing highly energy-efficient biosignal devices demands a fine-tuning to meet the constraints typically required in everyday scenarios. This thesis work tackles these challenges in specific case studies, devising solutions based on bioelectrical signals, namely EEG and EMG, for advanced hand gesture recognition. The implementation of these systems followed a complete analysis to reduce the overall intrusiveness of the system through sensor design and miniaturization of the hardware implementation. Several solutions have been studied to cope with the computational complexity of the DSP algorithms, including commercial single-core and open-source Parallel Ultra Low Power architectures, that have been selected accordingly also to reduce the overall system power consumption. By further adding energy harvesting techniques combined with the firmware and hardware optimization, the systems achieved self-sustainable operation or a significant boost in battery life. The HMI platforms presented are entirely programmable and provide computational power to satisfy the requirements of the studies applications while employing only a fraction of the CPU resources, giving the perspective of further application more advanced paradigms for the next generation of real-time embedded biosignal processing

    Hyperdimensional computing: a fast, robust and interpretable paradigm for biological data

    Full text link
    Advances in bioinformatics are primarily due to new algorithms for processing diverse biological data sources. While sophisticated alignment algorithms have been pivotal in analyzing biological sequences, deep learning has substantially transformed bioinformatics, addressing sequence, structure, and functional analyses. However, these methods are incredibly data-hungry, compute-intensive and hard to interpret. Hyperdimensional computing (HDC) has recently emerged as an intriguing alternative. The key idea is that random vectors of high dimensionality can represent concepts such as sequence identity or phylogeny. These vectors can then be combined using simple operators for learning, reasoning or querying by exploiting the peculiar properties of high-dimensional spaces. Our work reviews and explores the potential of HDC for bioinformatics, emphasizing its efficiency, interpretability, and adeptness in handling multimodal and structured data. HDC holds a lot of potential for various omics data searching, biosignal analysis and health applications

    Optimizing AI at the Edge: from network topology design to MCU deployment

    Get PDF
    The first topic analyzed in the thesis will be Neural Architecture Search (NAS). I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. After that, the second topic described is the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. I conclude the thesis with two different applications on bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition
    corecore