185 research outputs found

    Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

    Get PDF
    While Transformer-based text classifiers pre-trained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentence inputs. We address the challenge of introducing methods for word-by-word left-to-right incremental processing to Transformers such as BERT, models without an intrinsic sense of linear order. We modify the training method and live decoding of non-incremental models to detect speech disfluencies with minimum latency and without pre-segmentation of dialogue acts. We experiment with several decoding methods to predict the rightward context of the word currently being processed using a GPT-2 language model and apply a BERT-based disfluency detector to sequences, including predicted words. We show our method of incrementalising Transformers maintains most of their high non-incremental performance while operating strictly incrementally. We also evaluate our models’ incremental performance to establish the trade-off between incremental performance and final performance, using different prediction strategies. We apply our system to incremental speech recognition results as they arrive into a live system and achieve state-of-the-art results in this setting

    “The sleep data looks way better than I feel.” An autoethnographic account and diffractive reading of sleep-tracking

    Get PDF
    Sleep-tracking products are promising their users an improvement to their sleep by focusing on behavior change but often neglecting the contextual and individual factors contributing to sleep quality and quantity. Making good sleep for productive scheduling a personal responsibility does not necessarily lead to better sleep and may cause stress and anxiety. In an autoethnographic study, the first author of this paper tracked her sleep for one month using a diary, body maps and an Oura ring and compared her subjectively felt sleep experience with the data produced by the Oura app. A thematic analysis of the data resulted in four themes describing the relationship between the user-researcher and her wearable sleep-tracker: (1) good sleep scores are motivating, (2) experience that matches the data leads to sense-making, (3) contradictory information from the app leads to frustration, and (4) the sleep-tracker competes with other social agents. A diffractive reading of the data and research process, following Karen Barad's methodology, resulted in a discussion of how data passes through the analog and digital apparatus and what contextual factors are left out but still significantly impact sleep quality and quantity. We add to a canon of sleep research recommending a move away from representing sleep in terms of comparison and competition, uncoupling it from neoliberal capitalistic productivity and self-improvement narratives which are often key contributing factors to bad sleep in the first place

    Detecting Alzheimer's Disease Using Interactional and Acoustic Features from Spontaneous Speech

    Get PDF
    Alzheimer’s Disease (AD) is a form of Dementia that manifests in cognitive decline including memory, language, and changes in behavior. Speech data has proven valuable for inferring cognitive status, used in many health assessment tasks, and can be easily elicited in natural settings. Much work focuses on analysis using linguistic features; here, we focus on non-linguistic features and their use in distinguishing AD patients from similar-age Non-AD patients with other health conditions in the Carolinas Conversation Collection (CCC) dataset. We used two types of features: patterns of interaction including pausing behaviour and floor control, and acoustic features including pitch, amplitude, energy, and cepstral coefficients. Fusion of the two kinds of features, combined with feature selection, obtains very promising classification results: classification accuracy of 90% using standard models such as support vector machines and logistic regression. We also obtain promising results using interactional features alone (87% accuracy), which can be easily extracted from natural conversations in daily life and thus have the potential for future implementation as a noninvasive method for AD diagnosis and monitoring

    Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

    Full text link
    We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.Comment: INTERSPEECH 2021. arXiv admin note: substantial text overlap with arXiv:2106.0966

    Modelling Incremental Self-Repair Processing in Dialogue.

    Get PDF
    PhDSelf-repairs, where speakers repeat themselves, reformulate or restart what they are saying, are pervasive in human dialogue. These phenomena provide a window into real-time human language processing. For explanatory adequacy, a model of dialogue must include mechanisms that account for them. Artificial dialogue agents also need this capability for more natural interaction with human users. This thesis investigates the structure of self-repair and its function in the incremental construction of meaning in interaction. A corpus study shows how the range of self-repairs seen in dialogue cannot be accounted for by looking at surface form alone. More particularly it analyses a string-alignment approach and shows how it is insufficient, provides requirements for a suitable model of incremental context and an ontology of self-repair function. An information-theoretic model is developed which addresses these issues along with a system that automatically detects self-repairs and edit terms on transcripts incrementally with minimal latency, achieving state-of-the-art results. Additionally it is shown to have practical use in the psychiatric domain. The thesis goes on to present a dialogue model to interpret and generate repaired utterances incrementally. When processing repaired rather than fluent utterances, it achieves the same degree of incremental interpretation and incremental representation. Practical implementation methods are presented for an existing dialogue system. Finally, a more pragmatically oriented approach is presented to model self-repairs in a psycholinguistically plausible way. This is achieved through extending the dialogue model to include a probabilistic semantic framework to perform incremental inference in a reference resolution domain. The thesis concludes that at least as fine-grained a model of context as word-by-word is required for realistic models of self-repair, and context must include linguistic action sequences and information update effects. The way dialogue participants process self-repairs to make inferences in real time, rather than filter out their disfluency effects, has been modelled formally and in practical systems.Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Account (DTA) scholarship from the School of Electronic Engineering and Computer Science at Queen Mary University of London

    Incremental Semantics for Dialogue Processing: Requirements, and a Comparison of Two Approaches

    Get PDF
    International audienceTruly interactive dialogue systems need to construct meaning on at least a word-byword basis. We propose desiderata for incremental semantics for dialogue models and systems, a task not heretofore attempted thoroughly. After laying out the desirable properties we illustrate how they are met by current approaches, comparing two incremental semantic processing frameworks: Dynamic Syntax enriched with Type Theory with Records (DS-TTR) and Robust Minimal Recursion Semantics with incremental processing (RMRS-IP). We conclude these approaches are not significantly different with regards to their semantic representation construction, however their purported role within semantic models and dialogue models is where they diverge

    It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot

    Get PDF
    Hough J, Schlangen D. It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot. In: Proceedings of the 2017 Conference on Human-Robot Interaction (HRI2017). 2017
    corecore