9 research outputs found

    Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

    Get PDF
    Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers

    Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting

    Get PDF
    Used for simple voice commands and wake-word detection, keyword spotting (KWS) is the task of detecting pre-determined keywords in a stream of utterances. A common implementation of KWS involves transmitting audio samples over the network and detecting target keywords in the cloud with neural networks because on-device application development presents compatibility issues with various edge devices and provides limited supports for deep learning. Unfortunately, such an architecture can lead to unpleasant user experiences because network latency is not deterministic. Furthermore, the client-server architecture raises privacy concerns because users lose control over the audio data once it leaves the edge device. In this thesis, I present Honkling, a novel, JavaScript-based KWS system. Unlike previous KWS systems, Honkling operates purely on the client-side—Honkling is decentralized and serverless. Given that it is implemented in JavaScript, Honkling can be deployed directly in the browser, achieving higher compatibility and efficiency than the existing client-server architecture. From a comprehensive efficiency evaluation on desktops, laptops, and mobile devices, it is found that in-browser keyword detection only takes 0.5 seconds and achieves a high accuracy of 94% on the Google Speech Commands dataset. From an empirical study, the accuracy of Honkling is found to be inconsistent in practice due to different accents. To ensure high detection accuracy for every user, I explore fine-tuning the trained model with user personalized recordings. From my thorough experiments, it is found that such a process can increase the absolute accuracy up to 10% with only five recordings per keyword. Furthermore, the study shows that in-browser fine-tuning only takes eight seconds in the presence of hardware acceleration

    Experimental implementation of a neural network optical channel equalizer in restricted hardware using pruning and quantization

    Get PDF
    The deployment of artificial neural networks-based optical channel equalizers on edge-computing devices is critically important for the next generation of optical communication systems. However, this is still a highly challenging problem, mainly due to the computational complexity of the artificial neural networks (NNs) required for the efficient equalization of nonlinear optical channels with large dispersion-induced memory. To implement the NN-based optical channel equalizer in hardware, a substantial complexity reduction is needed, while we have to keep an acceptable performance level of the simplified NN model. In this work, we address the complexity reduction problem by applying pruning and quantization techniques to an NN-based optical channel equalizer. We use an exemplary NN architecture, the multi-layer perceptron (MLP), to mitigate the impairments for 30 GBd 1000 km transmission over a standard single-mode fiber, and demonstrate that it is feasible to reduce the equalizer's memory by up to 87.12%, and its complexity by up to 78.34%, without noticeable performance degradation. In addition to this, we accurately define the computational complexity of a compressed NN-based equalizer in the digital signal processing (DSP) sense. Further, we examine the impact of using hardware with different CPU and GPU features on the power consumption and latency for the compressed equalizer. We also verify the developed technique experimentally, by implementing the reduced NN equalizer on two standard edge-computing hardware units: Raspberry Pi 4 and Nvidia Jetson Nano, which are used to process the data generated via simulating the signal's propagation down the optical-fiber system

    An Experimental Analysis of Multi-Perspective Convolutional Neural Networks

    Get PDF
    Modelling the similarity of sentence pairs is an important problem in natural language processing and information retrieval, with applications in tasks such as paraphrase identification and answer selection in question answering. The Multi-Perspective Convolutional Neural Network (MP-CNN) is a model that improved previous state-of-the-art models in 2015 and has remained a popular model for sentence similarity tasks. However, until now, there has not been a rigorous study of how the model actually achieves competitive accuracy. In this thesis, we report on a series of detailed experiments that break down the contribution of each component of MP-CNN towards its statistical accuracy and how they affect model robustness. We find that two key components of MP-CNN are non-essential to achieve competitive accuracy and they make the model less robust to changes in hyperparameters. Furthermore, we suggest simple changes to the architecture and experimentally show that we improve the accuracy of MP-CNN when we remove these two major components of MP-CNN and incorporate these small changes, pushing its scores closer to more recent works on competitive semantic textual similarity and answer selection datasets, while using eight times fewer parameters

    Doing Things with Words: The New Consequences of Writing in the Age of AI

    Get PDF
    Exploring the entanglement between artificial intelligence (AI) and writing, this thesis asks, what does writing with AI do? And, how can this doing be made visible, since the consequences of information and communication technologies (ICTs) are so often opaque? To propose one set of answers to the questions above, I begin by working with Google Smart Compose, the word-prediction AI Google launched to more than a billion global users in 2018, by way of a novel method I call AI interaction experiments. In these experiments, I transcribe texts into Gmail and Google Docs, carefully documenting Smart Compose’s interventions and output. Wedding these experiments to existing scholarship, I argue that writing with AI does three things: it engages writers in asymmetrical economic relations with Big Tech; it entangles unwitting writers in climate crisis by virtue of the vast resources, as Bender et al. (2021), Crawford (2021), and Strubell et al. (2019) have pointed out, required to train and sustain AI models; and it perpetuates linguistic racism, further embedding harmful politics of race and representation in everyday life. In making these arguments, my purpose is to intervene in normative discourses surrounding technology, exposing hard-to-see consequences so that we—people in the academy, critical media scholars, educators, and especially those of us in dominant groups— may envision better futures. Toward both exposure and reimagining, my dissertation’s primary contributions are research-creational work. Research-creational interventions accompany each of the three major chapters of this work, drawing attention to the economic, climate, and race relations that word-prediction AI conceals and to the otherwise opaque premises on which it rests. The broader wager of my dissertation is that what technologies do and what they are is inseparable: the relations a technology enacts must be exposed, and they must necessarily figure into how we understand the technology itself. Because writing with AI enacts particular economic, climate, and race relations, these relations must figure into our understanding of what it means to write with AI and, because of AI’s increasing entanglement with acts of writing, into our very understanding of what it means to write
    corecore