920 research outputs found

    On the Perceptual Organization of Speech

    Get PDF
    A general account of auditory perceptual organization has developed in the past 2 decades. It relies on primitive devices akin to the Gestalt principles of organization to assign sensory elements to probable groupings and invokes secondary schematic processes to confirm or to repair the possible organization. Although this conceptualization is intended to apply universally, the variety and arrangement of acoustic constituents of speech violate Gestalt principles at numerous junctures, cohering perceptually, nonetheless. The authors report 3 experiments on organization in phonetic perception, using sine wave synthesis to evade the Gestalt rules and the schematic processes alike. These findings falsify a general auditory account, showing that phonetic perceptual organization is achieved by specific sensitivity to the acoustic modulations characteristic of speech signals

    Are the Products of Statistical Learning Abstract or Stimulus-Specific?

    Get PDF
    Learners can segment potential lexical units from syllable streams when statistically variable transitional probabilities between adjacent syllables are the only cues to word boundaries. Here we examine the nature of the representations that result from statistical learning by assessing learners’ ability to generalize across acoustically different stimuli. In three experiments, we compare two possibilities: that the products of statistical segmentation processes are abstract and generalizable representations, or, alternatively, that products of statistical learning are stimulus-bound and restricted to perceptually similar instances. In Experiment 1, learners segmented units from statistically predictable streams, and recognized these units when they were acoustically transformed by temporal reversals. In Experiment 2, learners were able to segment units from temporally reversed syllable streams, but were only able to generalize in conditions of mild acoustic transformation. In Experiment 3, learners were able to recognize statistically segmented units after a voice change but were unable to do so when the novel voice was mildly distorted. Together these results suggest that representations that result from statistical learning can be abstracted to some degree, but not in all listening conditions

    Perceptual Restoration of Temporally Distorted Speech in L1 vs. L2: Local Time Reversal and Modulation Filtering

    Get PDF
    Speech is intelligible even when the temporal envelope of speech is distorted. The current study investigates how native and non-native speakers perceptually restore temporally distorted speech. Participants were native English speakers (NS), and native Japanese speakers who spoke English as a second language (NNS). In Experiment 1, participants listened to “locally time-reversed speech” where every x-ms of speech signal was reversed on the temporal axis. Here, the local time reversal shifted the constituents of the speech signal forward or backward from the original position, and the amplitude envelope of speech was altered as a function of reversed segment length. In Experiment 2, participants listened to “modulation-filtered speech” where the modulation frequency components of speech were low-pass filtered at a particular cut-off frequency. Here, the temporal envelope of speech was altered as a function of cut-off frequency. The results suggest that speech becomes gradually unintelligible as the length of reversed segments increases (Experiment 1), and as a lower cut-off frequency is imposed (Experiment 2). Both experiments exhibit the equivalent level of speech intelligibility across six levels of degradation for native and non-native speakers respectively, which poses a question whether the regular occurrence of local time reversal can be discussed in the modulation frequency domain, by simply converting the length of reversed segments (ms) into frequency (Hz)

    On segments and syllables in the sound structure of language: Curve-based approaches to phonology and the auditory representation of speech.

    Get PDF
    http://msh.revues.org/document7813.htmlInternational audienceRecent approaches to the syllable reintroduce continuous and mathematical descriptions of sound objects designed as ''curves''. Psycholinguistic research on oral language perception usually refer to symbolic and highly hierarchized approaches to the syllable which strongly differenciate segments (phones) and syllables. Recent work on the auditory bases of speech perception evidence the ability of listeners to extract phonetic information when strong degradations of the speech signal have been produced in the spectro-temporal domain. Implications of these observations for the modelling of syllables in the fields of speech perception and phonology are discussed.Les approches récentes de la syllabe réintroduisent une description continue et descriptible mathématiquement des objets sonores: les courbes. Les recherches psycholinguistiques sur la perception du langage parlé ont plutôt recours à des descriptions symboliques et hautement hiérarchisées de la syllabe dans le cadre desquelles segments (phones) et syllabes sont strictement différenciés. Des travaux récents sur les fondements auditifs de la perception de la parole mettent en évidence la capacité qu'ont les locuteurs à extraire une information phonétique alors même que des dégradations majeures du signal sont effectuées dans le domaine spectro-temporel. Les implications de ces observations pour la conception de la syllabe dans le champ de la perception de la parole et en phonologie sont discutées

    Windows into Sensory Integration and Rates in Language Processing: Insights from Signed and Spoken Languages

    Get PDF
    This dissertation explores the hypothesis that language processing proceeds in "windows" that correspond to representational units, where sensory signals are integrated according to time-scales that correspond to the rate of the input. To investigate universal mechanisms, a comparison of signed and spoken languages is necessary. Underlying the seemingly effortless process of language comprehension is the perceiver's knowledge about the rate at which linguistic form and meaning unfold in time and the ability to adapt to variations in the input. The vast body of work in this area has focused on speech perception, where the goal is to determine how linguistic information is recovered from acoustic signals. Testing some of these theories in the visual processing of American Sign Language (ASL) provides a unique opportunity to better understand how sign languages are processed and which aspects of speech perception models are in fact about language perception across modalities. The first part of the dissertation presents three psychophysical experiments investigating temporal integration windows in sign language perception by testing the intelligibility of locally time-reversed sentences. The findings demonstrate the contribution of modality for the time-scales of these windows, where signing is successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 ms), while also pointing to modality-independent mechanisms, where integration occurs in durations that correspond to the size of linguistic units. The second part of the dissertation focuses on production rates in sentences taken from natural conversations of English, Korean, and ASL. Data from word, sign, morpheme, and syllable rates suggest that while the rate of words and signs can vary from language to language, the relationship between the rate of syllables and morphemes is relatively consistent among these typologically diverse languages. The results from rates in ASL also complement the findings in perception experiments by confirming that time-scales at which phonological units fluctuate in production match the temporal integration windows in perception. These results are consistent with the hypothesis that there are modality-independent time pressures for language processing, and discussions provide a synthesis of converging findings from other domains of research and propose ideas for future investigations

    Stimulus and cognitive factors in cortical entrainment to speech

    Get PDF
    Understanding speech is a difficult computational problem yet the human brain does it with ease. Entrainment of oscillatory neural activity to acoustic features of speech is an example of dynamic coupling between cortical activity and sensory inputs. The phenomenon may be a bottom-up, sensory-driven neurophysiological mechanism that supports speech processing. However, cognitive top-down factors such as linguistic knowledge and attentional focus affect speech perception, especially in challenging real-world environments. It is unclear how these top-down influences affect cortical entrainment to speech. We used electroencephalography to measure cortical entrainment to speech under conditions of acoustic and cognitive interference. By manipulating the bottom-up, sensory features in the acoustic scene we found evidence of top-down influences of attentional selection and linguistic processing on speech-entrained activity

    Does seeing an Asian face make speech sound more accented?

    Get PDF
    Published online: 17 May 2017Prior studies have reported that seeing an Asian face makes American English sound more accented. The current study investigates whether this effect is perceptual, or if it instead occurs at a later decision stage. We first replicated the finding that showing static Asian and Caucasian faces can shift people’s reports about the accentedness of speech accompanying the pictures. When we changed the static pictures to dubbed videos, reducing the demand characteristics, the shift in reported accentedness largely disappeared. By including unambiguous items along with the original ambiguous items, we introduced a contrast bias and actually reversed the shift, with the Asian-face videos yielding lower judgments of accentedness than the Caucasian-face videos. By changing to a mixed rather than blocked design, so that the ethnicity of the videos varied from trial to trial, we eliminated the difference in accentedness rating. Finally, we tested participants’ perception of accented speech using the selective adaptation paradigm. After establishing that an auditory-only accented adaptor shifted the perception of how accented test words are, we found that no such adaptation effect occurred when the adapting sounds relied on visual information (Asian vs. Caucasian videos) to influence the accentedness of an ambiguous auditory adaptor. Collectively, the results demonstrate that visual information can affect the interpretation, but not the perception, of accented speech.Support was provided by Ministerio de Ciencia E Innovacion, Grant PSI2014-53277, Centro de Excelencia Severo Ochoa, Grant SEV-2015- 0490, and by the National Science Foundation under Grant IBSS-1519908

    Recalibration of auditory phoneme perception by lipread and lexical information

    Get PDF

    Prosodic temporal alignment of co-speech gestures to speech facilitates referent resolution

    No full text
    Using a referent detection paradigm, we examined whether listeners can determine the object speakers are referring to by using the temporal alignment between the motion speakers impose on objects and their labeling utterances. Stimuli were created by videotaping speakers labeling a novel creature. Without being explicitly instructed to do so, speakers moved the creature during labeling. Trajectories of these motions were used to animate photographs of the creature. Participants in subsequent perception studies heard these labeling utterances while seeing side-by-side animations of two identical creatures in which only the target creature moved as originally intended by the speaker. Using the cross-modal temporal relationship between speech and referent motion, participants identified which creature the speaker was labeling, even when the labeling utterances were low-pass filtered to remove their semantic content or replaced by tone analogues. However, when the prosodic structure was eliminated by reversing the speech signal, participants no longer detected the referent as readily. These results provide strong support for a prosodic cross-modal alignment hypothesis. Speakers produce a perceptible link between the motion they impose upon a referent and the prosodic structure of their speech, and listeners readily use this prosodic cross-modal relationship to resolve referential ambiguity in word-learning situations
    corecore