32 research outputs found

    Deep Spoken Keyword Spotting:An Overview

    Get PDF
    Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Full text link
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa

    Scaling Machine Learning Systems using Domain Adaptation

    Get PDF
    Machine-learned components, particularly those trained using deep learning methods, are becoming integral parts of modern intelligent systems, with applications including computer vision, speech processing, natural language processing and human activity recognition. As these machine learning (ML) systems scale to real-world settings, they will encounter scenarios where the distribution of the data in the real-world (i.e., the target domain) is different from the data on which they were trained (i.e., the source domain). This phenomenon, known as domain shift, can significantly degrade the performance of ML systems in new deployment scenarios. In this thesis, we study the impact of domain shift caused by variations in system hardware, software and user preferences on the performance of ML systems. After quantifying the performance degradation of ML models in target domains due to the various types of domain shift, we propose unsupervised domain adaptation (uDA) algorithms that leverage unlabeled data collected in the target domain to improve the performance of the ML model. At its core, this thesis argues for the need to develop uDA solutions while adhering to practical scenarios in which ML systems will scale. More specifically, we consider four scenarios: (i) opaque ML systems, wherein parameters of the source prediction model are not made accessible in the target domain, (ii) transparent ML systems, wherein source model parameters are accessible and can be modified in the target domain, (iii) ML systems where source and target domains do not have identical label spaces, and (iv) distributed ML systems, wherein the source and target domains are geographically distributed, their datasets are private and cannot be exchanged using adaptation. We study the unique challenges and constraints of each scenario and propose novel uDA algorithms that outperform state-of-the-art baselines

    Contributions and applications around low resource deep learning modeling

    Get PDF
    El aprendizaje profundo representa la vanguardia del aprendizaje automático en multitud de aplicaciones. Muchas de estas tareas requieren una gran cantidad de recursos computacionales, lo que limita su adopción en dispositivos integrados. El objetivo principal de esta tesis es estudiar métodos y algoritmos que permiten abordar problemas utilizando aprendizaje profundo con bajos recursos computacionales. Este trabajo también tiene como objetivo presentar aplicaciones de aprendizaje profundo en la industria. La primera contribución es una nueva función de activación para redes de aprendizaje profundo: la función de módulo. Los experimentos muestran que la función de activación propuesta logra resultados superiores en tareas de visión artificial cuando se compara con las alternativas encontradas en la literatura. La segunda contribución es una nueva estrategia para combinar modelos preentrenados usando destilación de conocimiento. Los resultados de este capítulo muestran que es posible aumentar significativamente la precisión de los modelos preentrenados más pequeños, lo que permite un alto rendimiento a un menor costo computacional. La siguiente contribución de esta tesis aborda el problema de la previsión de ventas en el campo de la logística. Se proponen dos sistemas de extremo a extremo con dos técnicas diferentes de aprendizaje profundo (modelos de secuencia a secuencia y transformadores). Los resultados de este capítulo concluyen que es posible construir sistemas integrales para predecir las ventas de múltiples productos individuales, en múltiples puntos de venta y en diferentes momentos con un único modelo de aprendizaje automático. El modelo propuesto supera las alternativas encontradas en la literatura. Finalmente, las dos últimas contribuciones pertenecen al campo de la tecnología del habla. El primero estudia cómo construir un sistema de reconocimiento de voz Keyword Spotting utilizando una versión eficiente de una red neuronal convolucional. En este estudio, el sistema propuesto es capaz de superar el rendimiento de todos los puntos de referencia encontrados en la literatura cuando se prueba contra las subtareas más complejas. El último estudio propone un modelo independiente de texto a voz de última generación capaz de sintetizar voz inteligible en miles de perfiles de voz, mientras genera un discurso con variaciones de prosodia significativas y expresivas. El enfoque propuesto elimina la dependencia de los modelos anteriores de un sistema de voz adicional, lo que hace que el sistema propuesto sea más eficiente en el tiempo de entrenamiento e inferencia, y permite operaciones fuera de línea y en el dispositivo.Deep learning is the state of the art for several machine learning tasks. Many of these tasks require large amount of computational resources, which limits their adoption in embedded devices. The main goal of this dissertation is to study methods and algorithms that allow to approach problems using deep learning with restricted computational resources. This work also aims at presenting applications of deep learning in industry. The first contribution is a new activation function for deep learning networks: the modulus function. The experiments show that the proposed activation function achieves superior results in computer vision tasks when compared with the alternatives found in the literature. The second contribution is a new strategy to combine pre-trained models using knowledge distillation. The results of this chapter show that it is possible to significantly increase the accuracy of the smallest pre-trained models, allowing high performance at a lower computational cost. The following contribution in this thesis tackles the problem of sales fore- casting in the field of logistics. Two end-to-end systems with two different deep learning techniques (sequence-to-sequence models and transformers) are pro- posed. The results of this chapter conclude that it is possible to build end-to-end systems to predict the sales of multiple individual products, at multiple points of sale and different times with a single machine learning model. The proposed model outperforms the alternatives found in the literature. Finally, the last two contributions belong to the speech technology field. The former, studies how to build a Keyword Spotting speech recognition system using an efficient version of a convolutional neural network. In this study, the proposed system is able to beat the performance of all the benchmarks found in the literature when tested against the most complex subtasks. The latter study proposes a standalone state-of-the-art text-to-speech model capable of synthesizing intelligible voice in thousands of voice profiles, while generating speech with meaningful and expressive prosody variations. The proposed approach removes the dependency of previous models on an additional voice system, which makes the proposed system more efficient at training and inference time, and enables offline and on-device operations

    Detection and Mitigation of Steganographic Malware

    Get PDF
    A new attack trend concerns the use of some form of steganography and information hiding to make malware stealthier and able to elude many standard security mechanisms. Therefore, this Thesis addresses the detection and the mitigation of this class of threats. In particular, it considers malware implementing covert communications within network traffic or cloaking malicious payloads within digital images. The first research contribution of this Thesis is in the detection of network covert channels. Unfortunately, the literature on the topic lacks of real traffic traces or attack samples to perform precise tests or security assessments. Thus, a propaedeutic research activity has been devoted to develop two ad-hoc tools. The first allows to create covert channels targeting the IPv6 protocol by eavesdropping flows, whereas the second allows to embed secret data within arbitrary traffic traces that can be replayed to perform investigations in realistic conditions. This Thesis then starts with a security assessment concerning the impact of hidden network communications in production-quality scenarios. Results have been obtained by considering channels cloaking data in the most popular protocols (e.g., TLS, IPv4/v6, and ICMPv4/v6) and showcased that de-facto standard intrusion detection systems and firewalls (i.e., Snort, Suricata, and Zeek) are unable to spot this class of hazards. Since malware can conceal information (e.g., commands and configuration files) in almost every protocol, traffic feature or network element, configuring or adapting pre-existent security solutions could be not straightforward. Moreover, inspecting multiple protocols, fields or conversations at the same time could lead to performance issues. Thus, a major effort has been devoted to develop a suite based on the extended Berkeley Packet Filter (eBPF) to gain visibility over different network protocols/components and to efficiently collect various performance indicators or statistics by using a unique technology. This part of research allowed to spot the presence of network covert channels targeting the header of the IPv6 protocol or the inter-packet time of generic network conversations. In addition, the approach based on eBPF turned out to be very flexible and also allowed to reveal hidden data transfers between two processes co-located within the same host. Another important contribution of this part of the Thesis concerns the deployment of the suite in realistic scenarios and its comparison with other similar tools. Specifically, a thorough performance evaluation demonstrated that eBPF can be used to inspect traffic and reveal the presence of covert communications also when in the presence of high loads, e.g., it can sustain rates up to 3 Gbit/s with commodity hardware. To further address the problem of revealing network covert channels in realistic environments, this Thesis also investigates malware targeting traffic generated by Internet of Things devices. In this case, an incremental ensemble of autoencoders has been considered to face the ''unknown'' location of the hidden data generated by a threat covertly exchanging commands towards a remote attacker. The second research contribution of this Thesis is in the detection of malicious payloads hidden within digital images. In fact, the majority of real-world malware exploits hiding methods based on Least Significant Bit steganography and some of its variants, such as the Invoke-PSImage mechanism. Therefore, a relevant amount of research has been done to detect the presence of hidden data and classify the payload (e.g., malicious PowerShell scripts or PHP fragments). To this aim, mechanisms leveraging Deep Neural Networks (DNNs) proved to be flexible and effective since they can learn by combining raw low-level data and can be updated or retrained to consider unseen payloads or images with different features. To take into account realistic threat models, this Thesis studies malware targeting different types of images (i.e., favicons and icons) and various payloads (e.g., URLs and Ethereum addresses, as well as webshells). Obtained results showcased that DNNs can be considered a valid tool for spotting the presence of hidden contents since their detection accuracy is always above 90% also when facing ''elusion'' mechanisms such as basic obfuscation techniques or alternative encoding schemes. Lastly, when detection or classification are not possible (e.g., due to resource constraints), approaches enforcing ''sanitization'' can be applied. Thus, this Thesis also considers autoencoders able to disrupt hidden malicious contents without degrading the quality of the image

    Learning Sensory Representations with Minimal Supervision

    Get PDF
    corecore