190 research outputs found
Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information
In conversational speech, the acoustic signal provides cues that help
listeners disambiguate difficult parses. For automatically parsing spoken
utterances, we introduce a model that integrates transcribed text and
acoustic-prosodic features using a convolutional neural network over energy and
pitch trajectories coupled with an attention-based recurrent neural network
that accepts text and prosodic features. We find that different types of
acoustic-prosodic features are individually helpful, and together give
statistically significant improvements in parse and disfluency detection F1
scores over a strong text-only baseline. For this study with known sentence
boundaries, error analyses show that the main benefit of acoustic-prosodic
features is in sentences with disfluencies, attachment decisions are most
improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201
Not all disfluencies are are equal: The effects of disfluent repetitions on language comprehension
Disfluencies can affect language comprehension, but to date, most studies have focused on disfluent pauses such as er. We investigated whether disfluent repetitions in speech have discernible effects on listeners during language comprehension, and whether repetitions affect the linguistic processing of subsequent words in speech in ways which have been previously observed with ers. We used event-related potentials (ERPs) to measure participants’ neural responses to disfluent repetitions of words relative to acoustically identical words in fluent contexts, as well as to unpredictable and predictable words that occurred immediately post-disfluency and in fluent utterances. We additionally measured participants’ recognition memories for the predictable and unpredictable words. Repetitions elicited an early onsetting relative positivity (100–400 ms post-stimulus), clearly demonstrating listeners’ sensitivity to the presence of disfluent repetitions. Unpredictable words elicited an N400 effect. Importantly, there was no evidence that this effect, thought to reflect the difficulty of semantically integrating unpredictable compared to predictable words, differed quantitatively between fluent and disfluent utterances. Furthermore there was no evidence that the memorability of words was affected by the presence of a preceding repetition. These findings contrast with previous research which demonstrated an N400 attenuation of, and an increase in memorability for, words that were preceded by an er. However, in a later (600–900 ms) time window, unpredictable words following a repetition elicited a relative positivity. Reanalysis of previous data confirmed the presence of a similar effect following an er. The effect may reflect difficulties in resuming linguistic processing following any disruption to speech
Disfluency as... er... delay: An investigation into the immediate and lasting consequences of disfluency and temporal delay using EEG and mixed-effects modelling
Difficulties in speech production are often marked by disfluency; fillers, hesitations, prolongations, repetitions and repairs. In recent years a body of work has emerged that demonstrates that listeners are sensitive to disfluency, and that this affects their expectations for upcoming speech, as well as their attention to the speech stream. This thesis investigates the extent to which delay may be responsible for triggering these effects.
The experiments reported in this thesis build on an Event Related Potential (ERP) paradigm developed by Corley et al., (2007), in which participants listened to sentences manipulated by both fluency and predictability. Corley et al. reported an attenuated N400 effect for words following disfluent ers, and interpreted this as indicating that the extent to which listeners made predictions was reduced following an er. In the current set of experiments, various noisy interruptions were added to Corley et al.,'s paradigm, time matched to the disfluent fillers. These manipulations allowed investigation of whether the same effects could be triggered by delay alone, in the absence of a cue indicating that the speaker was experiencing difficulty.
The first experiment, which contrasted disfluent ers with artificial beeps, revealed a small but significant reduction in N400 effect amplitude for words affected by ers but not by beeps. The second experiment, in which ers were contrasted with speaker generated coughs, revealed no fluency effects on the N400 effect. A third experiment combined the designs of Experiments 1 and 2 to verify whether the difference between them could be characterised as a context effect; one potential explanation for the difference between the outcomes of Experiments 1 and 2 is that the interpretation of an er is affected by the surrounding stimuli. However, in Experiment 3, once again no effect of fluency on the magnitude of the N400 effect was found. Taken together, the results of these three studies lead to the question of whether er's attenuation effect on the N400 is robust.
In a second part to each study, listeners took part in a surprise recognition memory test, comprising words which had been the critical words in the previous task intermixed with new words which had not appeared anywhere in the sentences previously heard. Participants were significantly more successful at recognising words which had been unpredictable in their contexts, and, importantly, for Experiments 1 and 2, were significantly more successful at recognising words which had featured in disfluent or interrupted sentences. There was no difference between the recognition rates of words which had been disfluent and those which were affected by a noisy interruption. Collard et al., (2008) demonstrated that disfluency could raise attention to the speech stream, and the finding that interrupted words are equally well remembered leads to the suggestion that any noisy interruption can raise attention. Overall, the finding of memory benefits in response to disfluency, in the absence of attenuated N400 effects leads to the suggestion that different elements of disfluencies may be responsible for triggering these effects.
The studies in this thesis also extend previous work by being designed to yield enough trials in the memory test portion of each experiment to permit ERP analysis of the memory data. Whilst clear ERP memory effects remained elusive, important progress was made in that memory ERPs were generated from a disfluency paradigm, and this provided a testing ground on which to demonstrate the use of linear mixed-effects modelling as an alternative to ANOVA analysis for ERPs. Mixed-effects models allow the analysis of unbalanced datasets, such as those generated in many memory experiments. Additionally, we demonstrate the ability to include crossed random effects for subjects and items, and when this is applied to the ERPs from the listening section of Experiment 1, the effect of fluency on N400 amplitude is no longer significant.
Taken together, the results from the studies reported in this thesis suggest that temporal delay or disruption in speech can trigger raised attention, but do not necessarily trigger changes in listeners' expectations
Disfluencies affect language comprehension: evidence from event-related potentials and recognition memory
Everyday speech is littered with disfluencies such as filled pauses, silent pauses,
repetitions and repairs which reflect a speaker’s language production difficulties.
But what are the effects on language comprehension?
This thesis took a novel approach to the study of disfluencies by combining an
investigation of the immediate effects on language processing with an investigation
of the longer-term effects for the representation of language in memory. A series of
experiments is reported which reflects the first attempt at a systematic investigation
of the effects of different types of disfluencies on language comprehension.
The experiments focused on the effects of three types of disfluencies—ers, silent
pauses, and repetitions—on the comprehension of subsequent words. Critical words
were either straightforward continuations of the pre-interrupted speech or a repair
word which corrected the pre-interrupted speech. In addition, the effects that occur
when er, repetition, and repair disfluencies themselves are processed, were assessed.
ERPs showed that the N400 effect elicited in response to contextually unpredictable
compared to predictable words was attenuated by the presence of a pre-target er
reflecting a reduction in the standard difference where unpredictable words are more
difficult to integrate into their contexts. This finding suggests that ers may reduce
the extent to which listeners make predictions about upcoming words. In addition, words preceded by an er were more likely to be correctly recognised in a subsequent
memory test. These findings demonstrate a longer-term consequence for representation
which may reflect heightened attention during processing. Silent pauses did not
affect the N400 but there was some indication of an effect on recognition memory.
Repetition disfluencies did not affect the N400 or recognition memory. These findings
demonstrate the importance of the nature of the disruption to speech. For all
types of disfluent utterances, unpredictable words elicited a Late Positive Complex
(LPC), possibly reflecting processes associated with memory retrieval and control
as listeners attempted to resume structural fluency after any interruption.
Ers themselves elicited standard attention-related ERP effects: the Mismatch Negativity
(MMN) and P300 effects, supporting the possibility that ers heighten attention.
Repetition disfluencies elicited a right posterior positivity, reflecting detection
of the disfluency and possibly syntactic reanalysis. Repair disfluencies elicited an
early frontal negativity, possibly related to the detection of a word category violation,
and a P600 effect, reflecting syntactic reanalysis. The presence of an er
preceding the repair eliminated the early negativity, but had no effect on the P600
suggesting that ers may prepare listeners for the possibility of an upcoming repair,
but that they do not reduce the difficulty associated with reanalysis.
Taken together, the results from the studies reported in the thesis support an account
of disfluency processing which incorporates both prediction and attentio
Comparing Different Methods for Disfluency Structure Detection
This paper presents a number of experiments focusing on assessing
the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that
using fully automatic prosodic features, disfluency structural regions
can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point
Are language production problems apparent in adults who no longer meet diagnostic criteria for attention-deficit/hyperactivity disorder?
In this study, we examined sentence production in a sample of adults (N = 21) who had had attention-deficit/hyperactivity disorder (ADHD) as children, but as adults no longer met DSM-IV diagnostic criteria (APA, 2000). This “remitted” group was assessed on a sentence production task. On each trial, participants saw two objects and a verb. Their task was to construct a sentence using the objects as arguments of the verb. Results showed more ungrammatical and disfluent utterances with one particular type of verb (i.e., participle). In a second set of analyses, we compared the remitted group to both control participants and a “persistent” group, who had ADHD as children and as adults. Results showed that remitters were more likely to produce ungrammatical utterances and to make repair disfluencies compared to controls, and they patterned more similarly to ADHD participants. Conclusions focus on language output in remitted ADHD, and the role of executive functions in language production
- …