53 research outputs found

    Deep Spoken Keyword Spotting:An Overview

    Get PDF
    Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS

    Enabling Deep Intelligence on Embedded Systems

    Get PDF
    As deep learning for resource-constrained systems become more popular, we see an increased number of intelligent embedded systems such as IoT devices, robots, autonomous vehicles, and the plethora of portable, wearable, and mobile devices that are feature-packed with a wide variety of machine learning tasks. However, the performance of DNNs (deep neural networks) running on an embedded system is significantly limited by the platform's CPU, memory, and battery-size; and their scope is limited to simplistic inference tasks only. This dissertation proposes on-device deep learning algorithms and supporting hardware designs, enabling embedded systems to efficiently perform deep intelligent tasks (i.e., deep neural networks) that are high-memory-footprint, compute-intensive, and energy-hungry beyond their limited computing resources. We name such on-device deep intelligence on embedded systems as Embedded Deep Intelligence. Specifically, we introduce resource-aware learning strategies devised to overcome the four fundamental constraints of embedded systems imposed on the way towards Embedded Deep Intelligence, i.e., in-memory multitask learning via introducing the concept of Neural Weight Virtualization, adaptive real-time learning via introducing the concept of SubFlow, opportunistic accelerated learning via introducing the concept of Neuro.ZERO, and energy-aware intermittent learning, which tackles the problems of the small size of memory, dynamic timing constraint, low-computing capability, and limited energy, respectively. Once deployed in the field with the proposed resource-aware learning strategies, embedded systems are not only able to perform deep inference tasks on sensor data but also update and re-train their learning models at run-time without requiring any help from any external system. Such an on-device learning capability of Embedded Deep Intelligence makes an embedded intelligent system real-time, privacy-aware, secure, autonomous, untethered, responsive, and adaptive without concern for its limited resources.Doctor of Philosoph

    Learning Sensory Representations with Minimal Supervision

    Get PDF

    SCALING UP TASK EXECUTION ON RESOURCE-CONSTRAINED SYSTEMS

    Get PDF
    The ubiquity of executing machine learning tasks on embedded systems with constrained resources has made efficient execution of neural networks on these systems under the CPU, memory, and energy constraints increasingly important. Different from high-end computing systems where resources are abundant and reliable, resource-constrained systems only have limited computational capability, limited memory, and limited energy supply. This dissertation focuses on how to take full advantage of the limited resources of these systems in order to improve task execution efficiency from different aspects of the execution pipeline. While the existing literature primarily aims at solving the problem by shrinking the model size according to the resource constraints, this dissertation aims to improve the execution efficiency for a given set of tasks from the following two aspects. Firstly, we propose SmartON, which is the first batteryless active event detection system that considers both the event arrival pattern as well as the harvested energy to determine when the system should wake up and what the duty cycle should be. Secondly, we propose Antler, which exploits the affinity between all pairs of tasks in a multitask inference system to construct a compact graph representation of the task set for a given overall size budget. To achieve the aforementioned algorithmic proposals, we propose the following hardware solutions. One is a controllable capacitor array that can expand the system’s energy storage on-the-fly. The other is a FRAM array that can accommodate multiple neural networks running on one system.Doctor of Philosoph

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field
    • …
    corecore