245 research outputs found

    A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs

    Get PDF
    The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in the consumer Internet of Things (IoT) market. As these technologies are being applied to keyword spotting (KWS), automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications, it is of paramount importance that they provide uncompromising performance for context learning in long sequences, which is a key benefit of the attention mechanism, and that they work seamlessly in polyphonic environments. In this work, we present a 25-mm 2^2 system-on-chip (SoC) in 16-nm FinFET technology, codenamed SM6, which executes end-to-end speech-enhancing attention-based ASR and NLP workloads. The SoC includes: 1) FlexASR, a highly reconfigurable NLP inference processor optimized for whole-model acceleration of bidirectional attention-based sequence-to-sequence (seq2seq) deep neural networks (DNNs); 2) a Markov random field source separation engine (MSSE), a probabilistic graphical model accelerator for unsupervised inference via Gibbs sampling, used for sound source separation; 3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand single Instruction/multiple data (SIMD) fast fourier transform (FFT) processing and performs various application logic (e.g., expectation–maximization (EM) algorithm and 8-bit floating-point (FP8) quantization); and 4) an always-on M0 subsystem for audio detection and power management. Measurement results demonstrate the efficiency ranges of 2.6–7.8 TFLOPs/W and 4.33–17.6 Gsamples/s/W for FlexASR and MSSE, respectively; MSSE denoising performance allowing 6 ×\times smaller ASR model to be stored on-chip with negligible accuracy loss; and 2.24-mJ energy consumption while achieving real-time throughput, end-to-end, and per-frame ASR latencies of 18 ms

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    Widening Access to Applied Machine Learning With TinyML

    Get PDF
    Broadening access to both computational and educational resources is crit- ical to diffusing machine learning (ML) innovation. However, today, most ML resources and experts are siloed in a few countries and organizations. In this article, we describe our pedagogical approach to increasing access to applied ML through a massive open online course (MOOC) on Tiny Machine Learning (TinyML). We suggest that TinyML, applied ML on resource-constrained embedded devices, is an attractive means to widen access because TinyML leverages low-cost and globally accessible hardware and encourages the development of complete, self-contained applications, from data collection to deployment. To this end, a collaboration between academia and industry produced a four part MOOC that provides application-oriented instruction on how to develop solutions using TinyML. The series is openly available on the edX MOOC platform, has no prerequisites beyond basic programming, and is designed for global learners from a variety of backgrounds. It introduces real-world applications, ML algorithms, data-set engineering, and the ethi- cal considerations of these technologies through hands-on programming and deployment of TinyML applications in both the cloud and on their own microcontrollers. To facili- tate continued learning, community building, and collaboration beyond the courses, we launched a standalone website, a forum, a chat, and an optional course-project com- petition. We also open-sourced the course materials, hoping they will inspire the next generation of ML practitioners and educators and further broaden access to cutting-edge ML technologies

    Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices

    Get PDF
    Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements. This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open-source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state-of-the-art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, preprocessing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community

    Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices

    Get PDF
    Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements. This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state of the art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, pre-processing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Optimizing AI at the Edge: from network topology design to MCU deployment

    Get PDF
    The first topic analyzed in the thesis will be Neural Architecture Search (NAS). I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. After that, the second topic described is the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. I conclude the thesis with two different applications on bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition
    • 

    corecore