82 research outputs found
Design choices in imaging speech comprehension: An Activation Likelihood Estimation (ALE) meta-analysis
The localisation of spoken language comprehension is debated extensively: is processing located anterior or posterior on the left temporal lobe, and is it left- or bilaterally organised? An Activation Likelihood Estimation (ALE) analysis was conducted on functional MRI and PET studies investigating speech comprehension to identify the neural network involved in comprehension processing. Furthermore, the analysis aimed to establish the effect of four design choices (scanning paradigm, non-speech baseline, the presence of a task, and the type of stimulus material) on this comprehension network. The analysis included 57 experiments contrasting intelligible with less intelligible or unintelligible stimuli. A large comprehension network was found across bilateral Superior Temporal Sulcus (STS), Middle Temporal Gyrus (MTG) and Superior Temporal (STS) bilaterally, in left Inferior Frontal Gyrus (IFG), left Precentral Gyrus, and Supplementary Motor Area (SMA) and pre-SMA. The core network for post-lexical processing was restricted to the temporal lobes bilaterally with the highest ALE values located anterior to Heschl's Gyrus. Activations in the ALE comprehension network outside the temporal lobes (left IFG, SMA/pre-SMA, and Precentral Gyrus) were driven by the use of sentences instead of words, the scanning paradigm, or the type of non-speech baseline
Automatic imitation of human and computer-generated vocal stimuli
Observing someone perform an action automatically activates neural substrates associated with executing that action. This covert response, or automatic imitation, is measured behaviourally using the stimulus–response compatibility (SRC) task. In an SRC task, participants are presented with compatible and incompatible response–distractor pairings (e.g., an instruction to say “ba” paired with an audio recording of “da” as an example of an incompatible trial). Automatic imitation is measured as the difference in response times (RT) or accuracy between incompatible and compatible trials. Larger automatic imitation effects have been interpreted as a larger covert imitation response. Past results suggest that an action’s biological status affects automatic imitation: Human-produced manual actions show enhanced automatic imitation effects compared with computer-generated actions. Per the integrated theory for language comprehension and production, action observation triggers a simulation process to recognize and interpret observed speech actions involving covert imitation. Human-generated actions are predicted to result in increased automatic imitation because the simulation process is predicted to engage more for actions produced by a speaker who is more similar to the listener. We conducted an online SRC task that presented participants with human and computer-generated speech stimuli to test this prediction. Participants responded faster to compatible than incompatible trials, showing an overall automatic imitation effect. Yet the human-generated and computer-generated vocal stimuli evoked similar automatic imitation effects. These results suggest that computer-generated speech stimuli evoke the same covert imitative response as human stimuli, thus rejecting predictions from the integrated theory of language comprehension and production
Automatic imitation of speech is enhanced for non-native sounds
Simulation accounts of speech perception posit that speech is covertly imitated to support perception in a top-down manner. Behaviourally, covert imitation is measured through the stimulus-response compatibility (SRC) task. In each trial of a speech SRC task, participants produce a target speech sound whilst perceiving a speech distractor that either matches the target (compatible condition) or does not (incompatible condition). The degree to which the distractor is covertly imitated is captured by the automatic imitation effect, computed as the difference in response times (RTs) between compatible and incompatible trials. Simulation accounts disagree on whether covert imitation is enhanced when speech perception is challenging or instead when the speech signal is most familiar to the speaker. To test these accounts, we conducted three experiments in which participants completed SRC tasks with native and non-native sounds. Experiment 1 uncovered larger automatic imitation effects in an SRC task with non-native sounds than with native sounds. Experiment 2 replicated the finding online, demonstrating its robustness and the applicability of speech SRC tasks online. Experiment 3 intermixed native and non-native sounds within a single SRC task to disentangle effects of perceiving non-native sounds from confounding effects of producing non-native speech actions. This last experiment confirmed that automatic imitation is enhanced for non-native speech distractors, supporting a compensatory function of covert imitation in speech perception. The experiment also uncovered a separate effect of producing non-native speech actions on enhancing automatic imitation effects
Motor imagery of speech:the involvement of primary motor cortex in manual and articulatory motor imagery
Motor imagery refers to the phenomenon of imagining performing an action without action execution. Motor imagery and motor execution are assumed to share a similar underlying neural system that involves primary motor cortex (M1). Previous studies have focused on motor imagery of manual actions, but articulatory motor imagery has not been investigated. In this study, transcranial magnetic stimulation (TMS) was used to elicit motor-evoked potentials (MEPs) from the articulatory muscles [orbicularis oris (OO)] as well as from hand muscles [first dorsal interosseous (FDI)]. Twenty participants were asked to execute or imagine performing a simple squeezing task involving a pair of tweezers, which was comparable across both effectors. MEPs were elicited at six time points (50, 150, 250, 350, 450, 550 ms post-stimulus) to track the time course of M1 involvement in both lip and hand tasks. The results showed increased MEP amplitudes for action execution compared to rest for both effectors at time points 350, 450 and 550 ms, but we found no evidence of increased cortical activation for motor imagery. The results indicate that motor imagery does not involve M1 for simple tasks for manual or articulatory muscles. The results have implications for models of mental imagery of simple articulatory gestures, in that no evidence is found for somatotopic activation of lip muscles in sub-phonemic contexts during motor imagery of such tasks, suggesting that motor simulation of relatively simple actions does not involve M1
The causal role of left and right superior temporal gyri in speech perception in noise : A Transcranial Magnetic Stimulation Study
Successful perception of speech in everyday listening conditions requires effective listening strategies to overcome common acoustic distortions, such as background noise. Convergent evidence from neuroimaging and clinical studies identify activation within the temporal lobes as key to successful speech perception. However, current neurobiological models disagree on whether the left temporal lobe is sufficient for successful speech perception or whether bilateral processing is required. We addressed this issue using TMS to selectively disrupt processing in either the left or right superior temporal gyrus (STG) of healthy participants to test whether the left temporal lobe is sufficient or whether both left and right STG are essential. Participants repeated keywords from sentences presented in background noise in a speech reception threshold task while receiving online repetitive TMS separately to the left STG, right STG, or vertex or while receiving no TMS. Results show an equal drop in performance following application of TMS to either left or right STG during the task. A separate group of participants performed a visual discrimination threshold task to control for the confounding side effects of TMS. Results show no effect of TMS on the control task, supporting the notion that the results of Experiment 1 can be attributed to modulation of cortical functioning in STG rather than to side effects associated with online TMS. These results indicate that successful speech perception in everyday listening conditions requires both left and right STG and thus have ramifications for our understanding of the neural organization of spoken language processing
The relevance of the availability of visual speech cues during adaptation to noise-vocoded speech
Purpose:
This study first aimed to establish whether viewing specific parts of the speaker's face (eyes or mouth), compared to viewing the whole face, affected adaptation to distorted noise-vocoded sentences. Second, this study also aimed to replicate results on processing of distorted speech from lab-based experiments in an online setup.
Method:
We monitored recognition accuracy online while participants were listening to noise-vocoded sentences. We first established if participants were able to perceive and adapt to audiovisual four-band noise-vocoded sentences when the entire moving face was visible (AV Full). Four further groups were then tested: a group in which participants viewed the moving lower part of the speaker's face (AV Mouth), a group in which participants only see the moving upper part of the face (AV Eyes), a group in which participants could not see the moving lower or upper face (AV Blocked), and a group in which participants saw an image of a still face (AV Still).
Results:
Participants repeated around 40% of the key words correctly and adapted during the experiment, but only when the moving mouth was visible. In contrast, performance was at floor level, and no adaptation took place, in conditions when the moving mouth was occluded.
Conclusions:
The results show the importance of being able to observe relevant visual speech information from the speaker's mouth region, but not the eyes/upper face region, when listening and adapting to distorted sentences online. Second, the results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies replicate
Transcranial magnetic stimulation and motor evoked potentials in speech perception research
Transcranial magnetic stimulation (TMS) has been employed to manipulate brain activity and to establish cortical excitability by eliciting motor evoked potentials (MEPs) in speech processing research. We will discuss the history, methodological underpinnings, key contributions, and future directions for studying speech processing using TMS and by eliciting MEPs. Furthermore, we will discuss specific challenges that are encountered when examining speech processing using TMS or by measuring MEPs. We suggest that future research may benefit from using TMS in conjunction with neuroimaging methods such as functional Magnetic Resonance Imaging or electroencephalography, and from the development of new stimulation protocols addressing cortico-cortical inhibition/facilitation and interhemispheric connectivity during speech processing
Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech
Purpose Visual cues from a speaker's face may benefit perceptual adaptation to degraded speech, but current evidence is limited. We aimed to replicate results from previous studies to establish the extent to which visual speech cues can lead to greater adaptation over time, extending existing results to a real-time adaptation paradigm (i.e., without a separate training period). A second aim was to investigate whether eye gaze patterns toward the speaker's mouth were related to better perception, hypothesizing that listeners who looked more at the speaker's mouth would show greater adaptation. Method A group of listeners (n = 30) was presented with 90 noise-vocoded sentences in audiovisual format, whereas a control group (n = 29) was presented with the audio signal only. Recognition accuracy was measured throughout and eye tracking was used to measure fixations toward the speaker's eyes and mouth in the audiovisual group. Results Previous studies were partially replicated: The audiovisual group had better recognition throughout and adapted slightly more rapidly, but both groups showed an equal amount of improvement overall. Longer fixations on the speaker's mouth in the audiovisual group were related to better overall accuracy. An exploratory analysis further demonstrated that the duration of fixations to the speaker's mouth decreased over time. Conclusions The results suggest that visual cues may not benefit adaptation to degraded speech as much as previously thought. Longer fixations on a speaker's mouth may play a role in successfully decoding visual speech cues; however, this will need to be confirmed in future research to fully understand how patterns of eye gaze are related to audiovisual speech recognition. All materials, data, and code are available at https://osf.io/2wqkf/
- …