87 research outputs found
Interaction of perceptual grouping and crossmodal temporal capture in tactile apparent-motion
Previous studies have shown that in tasks requiring participants to report the direction of apparent motion, task-irrelevant mono-beeps can "capture'' visual motion perception when the beeps occur temporally close to the visual stimuli. However, the contributions of the relative timing of multimodal events and the event structure, modulating uni- and/or crossmodal perceptual grouping, remain unclear. To examine this question and extend the investigation to the tactile modality, the current experiments presented tactile two-tap apparent-motion streams, with an SOA of 400 ms between successive, left-/right-hand middle-finger taps, accompanied by task-irrelevant, non-spatial auditory stimuli. The streams were shown for 90 seconds, and participants' task was to continuously report the perceived (left-or rightward) direction of tactile motion. In Experiment 1, each tactile stimulus was paired with an auditory beep, though odd-numbered taps were paired with an asynchronous beep, with audiotactile SOAs ranging from -75 ms to 75 ms. Perceived direction of tactile motion varied systematically with audiotactile SOA, indicative of a temporal-capture effect. In Experiment 2, two audiotactile SOAs-one short (75 ms), one long (325 ms)-were compared. The long-SOA condition preserved the crossmodal event structure (so the temporal-capture dynamics should have been similar to that in Experiment 1), but both beeps now occurred temporally close to the taps on one side (even-numbered taps). The two SOAs were found to produce opposite modulations of apparent motion, indicative of an influence of crossmodal grouping. In Experiment 3, only odd-numbered, but not even-numbered, taps were paired with auditory beeps. This abolished the temporal-capture effect and, instead, a dominant percept of apparent motion from the audiotactile side to the tactile-only side was observed independently of the SOA variation. These findings suggest that asymmetric crossmodal grouping leads to an attentional modulation of apparent motion, which inhibits crossmodal temporal-capture effects
Enhanced Visual Temporal Resolution in Autism Spectrum Disorders
Cognitive functions that rely on accurate sequencing of events, such as action planning and execution, verbal and nonverbal communication, and social interaction rely on well-tuned coding of temporal event-structure. Visual temporal event-structure coding was tested in 17 high-functioning adolescents and adults with autism spectrum disorder (ASD) and mental- and chronological-age matched typically-developing (TD) individuals using a perceptual simultaneity paradigm. Visual simultaneity thresholds were lower in individuals with ASD compared to TD individuals, suggesting that autism may be characterised by increased parsing of temporal event-structure, with a decreased capability for integration over time. Lower perceptual simultaneity thresholds in ASD were also related to increased developmental communication difficulties. These results are linked to detail-focussed and local processing bias
Audio-Visual Speech Timing Sensitivity Is Enhanced in Cluttered Conditions
Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room
Neural mechanisms of interstimulus interval-dependent responses in the primary auditory cortex of awake cats
<p>Abstract</p> <p>Background</p> <p>Primary auditory cortex (AI) neurons show qualitatively distinct response features to successive acoustic signals depending on the inter-stimulus intervals (ISI). Such ISI-dependent AI responses are believed to underlie, at least partially, categorical perception of click trains (elemental vs. fused quality) and stop consonant-vowel syllables (eg.,/da/-/ta/continuum).</p> <p>Methods</p> <p>Single unit recordings were conducted on 116 AI neurons in awake cats. Rectangular clicks were presented either alone (single click paradigm) or in a train fashion with variable ISI (2–480 ms) (click-train paradigm). Response features of AI neurons were quantified as a function of ISI: one measure was related to the degree of stimulus locking (temporal modulation transfer function [tMTF]) and another measure was based on firing rate (rate modulation transfer function [rMTF]). An additional modeling study was performed to gain insight into neurophysiological bases of the observed responses.</p> <p>Results</p> <p>In the click-train paradigm, the majority of the AI neurons ("synchronization type"; <it>n </it>= 72) showed stimulus-locking responses at long ISIs. The shorter cutoff ISI for stimulus-locking responses was on average ~30 ms and was level tolerant in accordance with the perceptual boundary of click trains and of consonant-vowel syllables. The shape of tMTF of those neurons was either band-pass or low-pass. The single click paradigm revealed, at maximum, four response periods in the following order: 1st excitation, 1st suppression, 2nd excitation then 2nd suppression. The 1st excitation and 1st suppression was found exclusively in the synchronization type, implying that the temporal interplay between excitation and suppression underlies stimulus-locking responses. Among these neurons, those showing the 2nd suppression had band-pass tMTF whereas those with low-pass tMTF never showed the 2nd suppression, implying that tMTF shape is mediated through the 2nd suppression. The recovery time course of excitability suggested the involvement of short-term plasticity. The observed phenomena were well captured by a single cell model which incorporated AMPA, GABA<sub>A</sub>, NMDA and GABA<sub>B </sub>receptors as well as short-term plasticity of thalamocortical synaptic connections.</p> <p>Conclusion</p> <p>Overall, it was suggested that ISI-dependent responses of the majority of AI neurons are configured through the temporal interplay of excitation and suppression (inhibition) along with short-term plasticity.</p
Optimal perceived timing: integrating sensory information with dynamically updated expectations
The environment has a temporal structure, and knowing when a stimulus will appear translates into increased perceptual performance. Here we investigated how the human brain exploits temporal regularity in stimulus sequences for perception. We find that the timing of stimuli that occasionally deviate from a regularly paced sequence is perceptually distorted. Stimuli presented earlier than expected are perceptually delayed, whereas stimuli presented on time and later than expected are perceptually accelerated. This result suggests that the brain regularizes slightly deviant stimuli with an asymmetry that leads to the perceptual acceleration of expected stimuli. We present a Bayesian model for the combination of dynamically-updated expectations, in the form of a priori probability of encountering future stimuli, with incoming sensory information. The asymmetries in the results are accounted for by the asymmetries in the distributions involved in the computational process
The Natural Statistics of Audiovisual Speech
Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver
Monkeys and Humans Share a Common Computation for Face/Voice Integration
Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a “race” model failed to account for their behavior patterns. Conversely, a “superposition model”, positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates
- …