190 research outputs found

    Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

    Full text link
    In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

    Detecting disfluency in spontaneous speech

    Get PDF

    Not all disfluencies are are equal: The effects of disfluent repetitions on language comprehension

    Get PDF
    Disfluencies can affect language comprehension, but to date, most studies have focused on disfluent pauses such as er. We investigated whether disfluent repetitions in speech have discernible effects on listeners during language comprehension, and whether repetitions affect the linguistic processing of subsequent words in speech in ways which have been previously observed with ers. We used event-related potentials (ERPs) to measure participants’ neural responses to disfluent repetitions of words relative to acoustically identical words in fluent contexts, as well as to unpredictable and predictable words that occurred immediately post-disfluency and in fluent utterances. We additionally measured participants’ recognition memories for the predictable and unpredictable words. Repetitions elicited an early onsetting relative positivity (100–400 ms post-stimulus), clearly demonstrating listeners’ sensitivity to the presence of disfluent repetitions. Unpredictable words elicited an N400 effect. Importantly, there was no evidence that this effect, thought to reflect the difficulty of semantically integrating unpredictable compared to predictable words, differed quantitatively between fluent and disfluent utterances. Furthermore there was no evidence that the memorability of words was affected by the presence of a preceding repetition. These findings contrast with previous research which demonstrated an N400 attenuation of, and an increase in memorability for, words that were preceded by an er. However, in a later (600–900 ms) time window, unpredictable words following a repetition elicited a relative positivity. Reanalysis of previous data confirmed the presence of a similar effect following an er. The effect may reflect difficulties in resuming linguistic processing following any disruption to speech

    Disfluency as... er... delay: An investigation into the immediate and lasting consequences of disfluency and temporal delay using EEG and mixed-effects modelling

    Get PDF
    Difficulties in speech production are often marked by disfluency; fillers, hesitations, prolongations, repetitions and repairs. In recent years a body of work has emerged that demonstrates that listeners are sensitive to disfluency, and that this affects their expectations for upcoming speech, as well as their attention to the speech stream. This thesis investigates the extent to which delay may be responsible for triggering these effects. The experiments reported in this thesis build on an Event Related Potential (ERP) paradigm developed by Corley et al., (2007), in which participants listened to sentences manipulated by both fluency and predictability. Corley et al. reported an attenuated N400 effect for words following disfluent ers, and interpreted this as indicating that the extent to which listeners made predictions was reduced following an er. In the current set of experiments, various noisy interruptions were added to Corley et al.,'s paradigm, time matched to the disfluent fillers. These manipulations allowed investigation of whether the same effects could be triggered by delay alone, in the absence of a cue indicating that the speaker was experiencing difficulty. The first experiment, which contrasted disfluent ers with artificial beeps, revealed a small but significant reduction in N400 effect amplitude for words affected by ers but not by beeps. The second experiment, in which ers were contrasted with speaker generated coughs, revealed no fluency effects on the N400 effect. A third experiment combined the designs of Experiments 1 and 2 to verify whether the difference between them could be characterised as a context effect; one potential explanation for the difference between the outcomes of Experiments 1 and 2 is that the interpretation of an er is affected by the surrounding stimuli. However, in Experiment 3, once again no effect of fluency on the magnitude of the N400 effect was found. Taken together, the results of these three studies lead to the question of whether er's attenuation effect on the N400 is robust. In a second part to each study, listeners took part in a surprise recognition memory test, comprising words which had been the critical words in the previous task intermixed with new words which had not appeared anywhere in the sentences previously heard. Participants were significantly more successful at recognising words which had been unpredictable in their contexts, and, importantly, for Experiments 1 and 2, were significantly more successful at recognising words which had featured in disfluent or interrupted sentences. There was no difference between the recognition rates of words which had been disfluent and those which were affected by a noisy interruption. Collard et al., (2008) demonstrated that disfluency could raise attention to the speech stream, and the finding that interrupted words are equally well remembered leads to the suggestion that any noisy interruption can raise attention. Overall, the finding of memory benefits in response to disfluency, in the absence of attenuated N400 effects leads to the suggestion that different elements of disfluencies may be responsible for triggering these effects. The studies in this thesis also extend previous work by being designed to yield enough trials in the memory test portion of each experiment to permit ERP analysis of the memory data. Whilst clear ERP memory effects remained elusive, important progress was made in that memory ERPs were generated from a disfluency paradigm, and this provided a testing ground on which to demonstrate the use of linear mixed-effects modelling as an alternative to ANOVA analysis for ERPs. Mixed-effects models allow the analysis of unbalanced datasets, such as those generated in many memory experiments. Additionally, we demonstrate the ability to include crossed random effects for subjects and items, and when this is applied to the ERPs from the listening section of Experiment 1, the effect of fluency on N400 amplitude is no longer significant. Taken together, the results from the studies reported in this thesis suggest that temporal delay or disruption in speech can trigger raised attention, but do not necessarily trigger changes in listeners' expectations

    Disfluencies affect language comprehension: evidence from event-related potentials and recognition memory

    Get PDF
    Everyday speech is littered with disfluencies such as filled pauses, silent pauses, repetitions and repairs which reflect a speaker’s language production difficulties. But what are the effects on language comprehension? This thesis took a novel approach to the study of disfluencies by combining an investigation of the immediate effects on language processing with an investigation of the longer-term effects for the representation of language in memory. A series of experiments is reported which reflects the first attempt at a systematic investigation of the effects of different types of disfluencies on language comprehension. The experiments focused on the effects of three types of disfluencies—ers, silent pauses, and repetitions—on the comprehension of subsequent words. Critical words were either straightforward continuations of the pre-interrupted speech or a repair word which corrected the pre-interrupted speech. In addition, the effects that occur when er, repetition, and repair disfluencies themselves are processed, were assessed. ERPs showed that the N400 effect elicited in response to contextually unpredictable compared to predictable words was attenuated by the presence of a pre-target er reflecting a reduction in the standard difference where unpredictable words are more difficult to integrate into their contexts. This finding suggests that ers may reduce the extent to which listeners make predictions about upcoming words. In addition, words preceded by an er were more likely to be correctly recognised in a subsequent memory test. These findings demonstrate a longer-term consequence for representation which may reflect heightened attention during processing. Silent pauses did not affect the N400 but there was some indication of an effect on recognition memory. Repetition disfluencies did not affect the N400 or recognition memory. These findings demonstrate the importance of the nature of the disruption to speech. For all types of disfluent utterances, unpredictable words elicited a Late Positive Complex (LPC), possibly reflecting processes associated with memory retrieval and control as listeners attempted to resume structural fluency after any interruption. Ers themselves elicited standard attention-related ERP effects: the Mismatch Negativity (MMN) and P300 effects, supporting the possibility that ers heighten attention. Repetition disfluencies elicited a right posterior positivity, reflecting detection of the disfluency and possibly syntactic reanalysis. Repair disfluencies elicited an early frontal negativity, possibly related to the detection of a word category violation, and a P600 effect, reflecting syntactic reanalysis. The presence of an er preceding the repair eliminated the early negativity, but had no effect on the P600 suggesting that ers may prepare listeners for the possibility of an upcoming repair, but that they do not reduce the difficulty associated with reanalysis. Taken together, the results from the studies reported in the thesis support an account of disfluency processing which incorporates both prediction and attentio

    Comparing Different Methods for Disfluency Structure Detection

    Get PDF
    This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point

    Are language production problems apparent in adults who no longer meet diagnostic criteria for attention-deficit/hyperactivity disorder?

    Get PDF
    In this study, we examined sentence production in a sample of adults (N = 21) who had had attention-deficit/hyperactivity disorder (ADHD) as children, but as adults no longer met DSM-IV diagnostic criteria (APA, 2000). This “remitted” group was assessed on a sentence production task. On each trial, participants saw two objects and a verb. Their task was to construct a sentence using the objects as arguments of the verb. Results showed more ungrammatical and disfluent utterances with one particular type of verb (i.e., participle). In a second set of analyses, we compared the remitted group to both control participants and a “persistent” group, who had ADHD as children and as adults. Results showed that remitters were more likely to produce ungrammatical utterances and to make repair disfluencies compared to controls, and they patterned more similarly to ADHD participants. Conclusions focus on language output in remitted ADHD, and the role of executive functions in language production
    corecore