185 research outputs found
Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
While Transformer-based text classifiers pre-trained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentence inputs. We address the challenge of introducing methods for word-by-word left-to-right incremental processing to Transformers such as BERT, models without an intrinsic sense of linear order. We modify the training method and live decoding of non-incremental models to detect speech disfluencies with minimum latency and without pre-segmentation of dialogue acts. We experiment with several decoding methods to predict the rightward context of the word currently being processed using a GPT-2 language model and apply a BERT-based disfluency detector to sequences, including predicted words. We show our method of incrementalising Transformers maintains most of their high non-incremental performance while operating strictly incrementally. We also evaluate our models’ incremental performance to establish the trade-off between incremental performance and final performance, using different prediction strategies. We apply our system to incremental speech recognition results as they arrive into a live system and achieve state-of-the-art results in this setting
“The sleep data looks way better than I feel.” An autoethnographic account and diffractive reading of sleep-tracking
Sleep-tracking products are promising their users an improvement to their sleep by focusing on behavior change but often neglecting the contextual and individual factors contributing to sleep quality and quantity. Making good sleep for productive scheduling a personal responsibility does not necessarily lead to better sleep and may cause stress and anxiety. In an autoethnographic study, the first author of this paper tracked her sleep for one month using a diary, body maps and an Oura ring and compared her subjectively felt sleep experience with the data produced by the Oura app. A thematic analysis of the data resulted in four themes describing the relationship between the user-researcher and her wearable sleep-tracker: (1) good sleep scores are motivating, (2) experience that matches the data leads to sense-making, (3) contradictory information from the app leads to frustration, and (4) the sleep-tracker competes with other social agents. A diffractive reading of the data and research process, following Karen Barad's methodology, resulted in a discussion of how data passes through the analog and digital apparatus and what contextual factors are left out but still significantly impact sleep quality and quantity. We add to a canon of sleep research recommending a move away from representing sleep in terms of comparison and competition, uncoupling it from neoliberal capitalistic productivity and self-improvement narratives which are often key contributing factors to bad sleep in the first place
Detecting Alzheimer's Disease Using Interactional and Acoustic Features from Spontaneous Speech
Alzheimer’s Disease (AD) is a form of Dementia that manifests in cognitive decline including memory, language, and changes in behavior. Speech data has proven valuable for inferring cognitive status, used in many health assessment tasks, and can be easily elicited in natural settings. Much work focuses on analysis using linguistic features; here, we focus on non-linguistic features and their use in distinguishing AD patients from similar-age Non-AD patients with other health conditions in the Carolinas Conversation Collection (CCC) dataset. We used two types of features: patterns of interaction including pausing behaviour and floor control, and acoustic features including pitch, amplitude, energy, and cepstral coefficients. Fusion of the two kinds of features, combined with feature selection, obtains very promising classification results: classification accuracy of 90% using standard models such as support vector machines and logistic regression. We also obtain promising results using interactional features alone (87% accuracy), which can be easily extracted from natural conversations in daily life and thus have the potential for future implementation as a noninvasive method for AD diagnosis and monitoring
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
We present two multimodal fusion-based deep learning models that consume ASR
transcribed speech and acoustic data simultaneously to classify whether a
speaker in a structured diagnostic task has Alzheimer's Disease and to what
degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM
with highway layers using words, word probabilities, disfluency features, pause
information, and a variety of acoustic features, achieves an accuracy of 84%
and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting
cognitive decline is more challenging, our models show improvement using the
multimodal approach and word probabilities, disfluency and pause information
over word-only models. We show considerable gains for AD classification using
multimodal fusion and gating, which can effectively deal with noisy inputs from
acoustic features and ASR hypotheses.Comment: INTERSPEECH 2021. arXiv admin note: substantial text overlap with
arXiv:2106.0966
Modelling Incremental Self-Repair Processing in Dialogue.
PhDSelf-repairs, where speakers repeat themselves, reformulate or restart what they are saying, are
pervasive in human dialogue. These phenomena provide a window into real-time human language
processing. For explanatory adequacy, a model of dialogue must include mechanisms that
account for them. Artificial dialogue agents also need this capability for more natural interaction
with human users. This thesis investigates the structure of self-repair and its function in the
incremental construction of meaning in interaction.
A corpus study shows how the range of self-repairs seen in dialogue cannot be accounted for
by looking at surface form alone. More particularly it analyses a string-alignment approach and
shows how it is insufficient, provides requirements for a suitable model of incremental context
and an ontology of self-repair function.
An information-theoretic model is developed which addresses these issues along with a system
that automatically detects self-repairs and edit terms on transcripts incrementally with minimal
latency, achieving state-of-the-art results. Additionally it is shown to have practical use in
the psychiatric domain.
The thesis goes on to present a dialogue model to interpret and generate repaired utterances
incrementally. When processing repaired rather than fluent utterances, it achieves the same
degree of incremental interpretation and incremental representation. Practical implementation
methods are presented for an existing dialogue system.
Finally, a more pragmatically oriented approach is presented to model self-repairs in a psycholinguistically
plausible way. This is achieved through extending the dialogue model to include
a probabilistic semantic framework to perform incremental inference in a reference resolution
domain.
The thesis concludes that at least as fine-grained a model of context as word-by-word is required
for realistic models of self-repair, and context must include linguistic action sequences
and information update effects. The way dialogue participants process self-repairs to make inferences
in real time, rather than filter out their disfluency effects, has been modelled formally and
in practical systems.Engineering and Physical Sciences Research Council (EPSRC)
Doctoral Training Account (DTA) scholarship from the School of Electronic Engineering and
Computer Science at Queen Mary University of London
Incremental Semantics for Dialogue Processing: Requirements, and a Comparison of Two Approaches
International audienceTruly interactive dialogue systems need to construct meaning on at least a word-byword basis. We propose desiderata for incremental semantics for dialogue models and systems, a task not heretofore attempted thoroughly. After laying out the desirable properties we illustrate how they are met by current approaches, comparing two incremental semantic processing frameworks: Dynamic Syntax enriched with Type Theory with Records (DS-TTR) and Robust Minimal Recursion Semantics with incremental processing (RMRS-IP). We conclude these approaches are not significantly different with regards to their semantic representation construction, however their purported role within semantic models and dialogue models is where they diverge
It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot
Hough J, Schlangen D. It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot. In: Proceedings of the 2017 Conference on Human-Robot Interaction (HRI2017). 2017
- …