272 research outputs found
Are language production problems apparent in adults who no longer meet diagnostic criteria for attention-deficit/hyperactivity disorder?
In this study, we examined sentence production in a sample of adults (Nβ=β21) who had had attention-deficit/hyperactivity disorder (ADHD) as children, but as adults no longer met DSM-IV diagnostic criteria (APA, 2000). This βremittedβ group was assessed on a sentence production task. On each trial, participants saw two objects and a verb. Their task was to construct a sentence using the objects as arguments of the verb. Results showed more ungrammatical and disfluent utterances with one particular type of verb (i.e., participle). In a second set of analyses, we compared the remitted group to both control participants and a βpersistentβ group, who had ADHD as children and as adults. Results showed that remitters were more likely to produce ungrammatical utterances and to make repair disfluencies compared to controls, and they patterned more similarly to ADHD participants. Conclusions focus on language output in remitted ADHD, and the role of executive functions in language production
Helping, I Mean Assessing Psychiatric Communication: An Applicaton of Incremental Self-Repair Detection
18th SemDial Workshop on the Semantics and Pragmatics of Dialogue (DialWatt), 1-3 September 2014, Edinburgh, ScotlandSelf-repair is pervasive in dialogue, and models thereof have long been a focus of research, particularly for disfluency detection in speech recognition and spoken dialogue systems. However, the generality of such models across domains has received little attention. In this paper we investigate the application of an automatic incremental self-repair detection
system, STIR, developed on the Switchboard corpus of telephone speech, to a new domain β psychiatric consultations. We find that word-level accuracy is reduced markedly by the differences in annotation schemes and transcription conventions between corpora, which has implications for the generalisability of all repair detection systems. However, overall rates of repair are detected accurately, promising a useful resource for clinical dialogue studies
DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages
Disfluency correction (DC) is the process of removing disfluent elements like
fillers, repetitions and corrections from spoken utterances to create readable
and interpretable text. DC is a vital post-processing step applied to Automatic
Speech Recognition (ASR) outputs, before subsequent processing by downstream
language understanding tasks. Existing DC research has primarily focused on
English due to the unavailability of large-scale open-source datasets. Towards
the goal of multilingual disfluency correction, we present a high-quality
human-annotated DC corpus covering four important Indo-European languages:
English, Hindi, German and French. We provide extensive analysis of results of
state-of-the-art DC models across all four languages obtaining F1 scores of
97.55 (English), 94.29 (Hindi), 95.89 (German) and 92.97 (French). To
demonstrate the benefits of DC on downstream tasks, we show that DC leads to
5.65 points increase in BLEU scores on average when used in conjunction with a
state-of-the-art Machine Translation (MT) system. We release code to run our
experiments along with our annotated dataset here.Comment: Accepted at EMNLP 2023 Finding
Recommended from our members
Effects of Duration, Locality, and Surprisal in Speech Disfluency Prediction in English Spontaneous Speech
This study examines the role of two influential theories of language processing, Surprisal Theory and Dependency Locality Theory (DLT), in predicting disfluencies (fillers and reparandums) in the Switchboard corpus of English conversational speech. Using Generalized Linear Mixed Models for this task, we incorporate syntactic factors (DLT-inspired costs and syntactic surprisal) in addition to lexical surprisal and duration, thus going beyond the local lexical frequency and predictability used in previous work on modelling word durations in Switchboard speech. Our results indicate that compared to fluent words, words preceding disfluencies tend to have lower lexical surprisal (hence higher activation levels) and lower syntactic complexity (low DLT costs and low syntactic surprisal except for reparandums). Disfluencies tend to occur before upcoming difficulties, i.e., high lexical surprisal words (low activation levels) with high syntactic complexity (high DLT costs and high syntactic surprisal). Further, we see that reparandums behave almost similarly to disfluent fillers with differences possibly arising due to effects being present in the word choice of the reparandum, i.e., in the disfluency itself rather than surrounding it. Moreover, words preceding disfluencies tend to be function words and have longer durations compared to their fluent counterparts, and word duration is a very effective predictor of disfluencies. Overall, speakers may be leveraging the differences in access between content and function words during planning as part of a mechanism to adapt for disfluencies while coordinating between planning and articulation
A novel multimodal dynamic fusion network for disfluency detection in spoken utterances
Disfluency, though originating from human spoken utterances, is primarily
studied as a uni-modal text-based Natural Language Processing (NLP) task. Based
on early-fusion and self-attention-based multimodal interaction between text
and acoustic modalities, in this paper, we propose a novel multimodal
architecture for disfluency detection from individual utterances. Our
architecture leverages a multimodal dynamic fusion network that adds minimal
parameters over an existing text encoder commonly used in prior art to leverage
the prosodic and acoustic cues hidden in speech. Through experiments, we show
that our proposed model achieves state-of-the-art results on the widely used
English Switchboard for disfluency detection and outperforms prior unimodal and
multimodal systems in literature by a significant margin. In addition, we make
a thorough qualitative analysis and show that, unlike text-only systems, which
suffer from spurious correlations in the data, our system overcomes this
problem through additional cues from speech signals. We make all our codes
publicly available on GitHub.Comment: Submitted to ICASSP 2023. arXiv admin note: text overlap with
arXiv:2203.1679
- β¦