34 research outputs found

    Do Eye Movements During Shape Discrimination Reveal an Underlying Geometric Structure?

    Get PDF
    Using a psychophysical approach coupled with eye-tracking measures, we varied length and width of shape stimuli to determine the objective parameters that corresponded to subjective determination of square/rectangle judgments. Participants viewed a two-dimensional shape stimulus and made a two-alternative forced-choice whether it was a square or rectangle. Participants’ gaze was tracked throughout the task to explore directed visual attention to the vertical and horizontal axes of space. Behavioral results provide threshold values for two-dimensional square/rectangle perception, and eye-tracking data indicated that participants directed attention to the major and minor principal axes. Results are consistent with the use of the major and minor principal axis of space for shape perception and may have theoretical and empirical implications for orientation via geometric cues

    A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

    No full text
    Audiovisual speech integration combines information from auditory speech (talker's voice) and visual speech (talker's mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga), that are integrated to produce a fused percept ("da"). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others

    Published estimates of group differences in multisensory integration are inflated.

    No full text
    A common measure of multisensory integration is the McGurk effect, an illusion in which incongruent auditory and visual speech are integrated to produce an entirely different percept. Published studies report that participants who differ in age, gender, culture, native language, or traits related to neurological or psychiatric disorders also differ in their susceptibility to the McGurk effect. These group-level differences are used as evidence for fundamental alterations in sensory processing between populations. Using empirical data and statistical simulations tested under a range of conditions, we show that published estimates of group differences in the McGurk effect are inflated when only statistically significant (p < 0.05) results are published. With a sample size typical of published studies, a group difference of 10% would be reported as 31%. As a consequence of this inflation, follow-up studies often fail to replicate published reports of large between-group differences. Inaccurate estimates of effect sizes and replication failures are especially problematic in studies of clinical populations involving expensive and time-consuming interventions, such as training paradigms to improve sensory processing. Reducing effect size inflation and increasing replicability requires increasing the number of participants by an order of magnitude compared with current practice

    Modeling of multisensory speech perception without causal inference.

    No full text
    <p>(A) There are two possible causal structures for a given audiovisual speech stimulus. If there is a common cause (<i>C</i> = 1), a single talker generates the auditory and visual speech. Alternatively, if there is not a common cause (<i>C</i> = 2), two separate talkers generate the auditory and visual speech. (B) We generate multisensory representations in a two-dimensional representational space. The prototypes of the syllables “ba,” “da,” and “ga” (location of text labels) are mapped into the representational space with locations determined by pairwise confusability. The x-axis represents auditory features; the y-axis represents visual features. (C) Encoding the auditory “ba” + visual “ga” (AbaVga) McGurk stimulus. The unisensory components of the stimulus are encoded with noise that is independent across modalities. On three trials in which an identical AbaVga stimulus is presented (represented as 1, 2, 3) the encoded representations of the auditory and visual components differ because of sensory noise, although they are centered on the prototype (gray ellipses show 95% probability region across all presentations). Shapes of ellipses reflect reliability of each modality: for auditory “ba” (ellipse labeled A), the ellipse has its short axis along the auditory x-axis; visual “ga” (ellipse labeled V) has its short axis along the visual y-axis. (D) On each trial, the unisensory representations are integrated using Bayes’ rule to produce an integrated representation that is located between the unisensory components in representational space. Numbers show the actual location of the integrated unisensory representations from <b><i>C</i></b>. Because of reliability weighting, the integrated representations are closer to “ga” along the visual y-axis, but closer to “ba” along the auditory x-axis (ellipse shows 95% probability region across all presentations). (E) Without causal inference (non-CIMS), the AV representation is the final representation. On most trials, the representation lies in the “da” region of representational space (numbers and 95% probability ellipse from <b>D</b>). (F) A linear decision rule is applied, resulting in a model prediction of exclusively “da” percepts across trials. (G) Behavioral data from 60 subjects reporting their percept of auditory “ba” + visual “ga”. Across trials, subjects reported the “ba” percept for 57% of trials and “da” for 40% of trials. (H) Encoding the auditory “ga” + visual “ba” (AgaVba) incongruent non-McGurk stimulus. The unisensory components are encoded with modality-specific noise; the auditory “ga” ellipse has its short axis along the auditory axis, the visual “ba” ellipse has its short axis along the visual axis. (I) Across many trials, the integrated representation (AV) is closer to “ga” along the auditory x-axis, but closer to “ba” along the visual <i>y</i>-axis. (J) Over many trials, the integrated representation is found most often in the “da” region of perceptual space. (K) Across trials, the non-CIMS model predicts “da” for the non-McGurk stimulus. (L) Behavioral data from 60 subjects reporting their perception of AgaVba. Subjects reported “ga” on 96% of trials.</p

    Generalizability of models tested with other audiovisual syllables.

    No full text
    <p>(A) Behavior for congruent syllables. Each row represents a different congruent audiovisual syllable (AbaVba, AdaVda, AgaVga). Subjects made a three-alternative forced choice (ba, ga, da). The colors within each row show how often subjects reported each choice when presented with each syllable (<i>e</i>.<i>g</i>. for AbaVba, they always reported “ba”). (B) Non-CIMS model predictions for congruent syllables. Rows show syllables, colors across columns within each row show how often model predicted that percept (darker colors indicate higher percentages). (C) CIMS model predictions for congruent syllables. (D) Behavior for incongruent syllables. Each row represents a different incongruent audiovisual syllable. Subjects made a three-alternative forced choice (ba, ga, da). The colors within each row show how often subjects reported each choice when presented with each syllable (<i>e</i>.<i>g</i>. for AbaVda, they more often reported “ba”, less often reported “da”, never reported “ga”). (E) Non-CIMS model predictions for incongruent syllables. Rows show syllables, colors across columns within each row show how often model predicted that percept (darker colors indicate higher percentages). (F) CIMS model predictions for incongruent syllables.</p

    Multivariate fMRI responses in superior temporal cortex predict visual contributions to, and individual differences in, the intelligibility of noisy speech

    No full text
    Humans have the unique ability to decode the rapid stream of language elements that constitute speech, even when it is contaminated by noise. Two reliable observations about noisy speech perception are that seeing the face of the talker improves intelligibility and the existence of individual differences in the ability to perceive noisy speech. We introduce a multivariate BOLD fMRI measure that explains both observations. In two independent fMRI studies, clear and noisy speech was presented in visual, auditory and audiovisual formats to thirty-seven participants who rated intelligibility. An event-related design was used to sort noisy speech trials by their intelligibility. Individual-differences multidimensional scaling was applied to fMRI response patterns in superior temporal cortex and the dissimilarity between responses to clear speech and noisy (but intelligible) speech was measured. Neural dissimilarity was less for audiovisual speech than auditory-only speech, corresponding to the greater intelligibility of noisy audiovisual speech. Dissimilarity was less in participants with better noisy speech perception, corresponding to individual differences. These relationships held for both single word and entire sentence stimuli, suggesting that they were driven by intelligibility rather than the specific stimuli tested. A neural measure of perceptual intelligibility may aid in the development of strategies for helping those with impaired speech perception

    A Laboratory Study of the McGurk Effect in 324 Monozygotic and Dizygotic Twins

    No full text
    Multisensory integration of information from the talker's voice and the talker's mouth facilitates human speech perception. A popular assay of audiovisual integration is the McGurk effect, an illusion in which incongruent visual speech information categorically changes the percept of auditory speech. There is substantial interindividual variability in susceptibility to the McGurk effect. To better understand possible sources of this variability, we examined the McGurk effect in 324 native Mandarin speakers, consisting of 73 monozygotic (MZ) and 89 dizygotic (DZ) twin pairs. When tested with 9 different McGurk stimuli, some participants never perceived the illusion and others always perceived it. Within participants, perception was similar across time (r = 0.55 at a 2-year retest in 150 participants) suggesting that McGurk susceptibility reflects a stable trait rather than short-term perceptual fluctuations. To examine the effects of shared genetics and prenatal environment, we compared McGurk susceptibility between MZ and DZ twins. Both twin types had significantly greater correlation than unrelated pairs (r = 0.28 for MZ twins and r = 0.21 for DZ twins) suggesting that the genes and environmental factors shared by twins contribute to individual differences in multisensory speech perception. Conversely, the existence of substantial differences within twin pairs (even MZ co-twins) and the overall low percentage of explained variance (5.5%) argues against a deterministic view of individual differences in multisensory integration