920 research outputs found
On the Perceptual Organization of Speech
A general account of auditory perceptual organization has developed in the past 2 decades. It relies on primitive devices akin to the Gestalt principles of organization to assign sensory elements to probable groupings and invokes secondary schematic processes to confirm or to repair the possible organization. Although this conceptualization is intended to apply universally, the variety and arrangement of acoustic constituents of speech violate Gestalt principles at numerous junctures, cohering perceptually, nonetheless. The authors report 3 experiments on organization in phonetic perception, using sine wave synthesis to evade the Gestalt rules and the schematic processes alike. These findings falsify a general auditory account, showing that phonetic perceptual organization is achieved by specific sensitivity to the acoustic modulations characteristic of speech signals
Are the Products of Statistical Learning Abstract or Stimulus-Specific?
Learners can segment potential lexical units from syllable streams when statistically variable transitional probabilities between adjacent syllables are the only cues to word boundaries. Here we examine the nature of the representations that result from statistical learning by assessing learners’ ability to generalize across acoustically different stimuli. In three experiments, we compare two possibilities: that the products of statistical segmentation processes are abstract and generalizable representations, or, alternatively, that products of statistical learning are stimulus-bound and restricted to perceptually similar instances. In Experiment 1, learners segmented units from statistically predictable streams, and recognized these units when they were acoustically transformed by temporal reversals. In Experiment 2, learners were able to segment units from temporally reversed syllable streams, but were only able to generalize in conditions of mild acoustic transformation. In Experiment 3, learners were able to recognize statistically segmented units after a voice change but were unable to do so when the novel voice was mildly distorted. Together these results suggest that representations that result from statistical learning can be abstracted to some degree, but not in all listening conditions
Perceptual Restoration of Temporally Distorted Speech in L1 vs. L2: Local Time Reversal and Modulation Filtering
Speech is intelligible even when the temporal envelope of speech is distorted. The current study investigates how native and non-native speakers perceptually restore temporally distorted speech. Participants were native English speakers (NS), and native Japanese speakers who spoke English as a second language (NNS). In Experiment 1, participants listened to “locally time-reversed speech” where every x-ms of speech signal was reversed on the temporal axis. Here, the local time reversal shifted the constituents of the speech signal forward or backward from the original position, and the amplitude envelope of speech was altered as a function of reversed segment length. In Experiment 2, participants listened to “modulation-filtered speech” where the modulation frequency components of speech were low-pass filtered at a particular cut-off frequency. Here, the temporal envelope of speech was altered as a function of cut-off frequency. The results suggest that speech becomes gradually unintelligible as the length of reversed segments increases (Experiment 1), and as a lower cut-off frequency is imposed (Experiment 2). Both experiments exhibit the equivalent level of speech intelligibility across six levels of degradation for native and non-native speakers respectively, which poses a question whether the regular occurrence of local time reversal can be discussed in the modulation frequency domain, by simply converting the length of reversed segments (ms) into frequency (Hz)
On segments and syllables in the sound structure of language: Curve-based approaches to phonology and the auditory representation of speech.
http://msh.revues.org/document7813.htmlInternational audienceRecent approaches to the syllable reintroduce continuous and mathematical descriptions of sound objects designed as ''curves''. Psycholinguistic research on oral language perception usually refer to symbolic and highly hierarchized approaches to the syllable which strongly differenciate segments (phones) and syllables. Recent work on the auditory bases of speech perception evidence the ability of listeners to extract phonetic information when strong degradations of the speech signal have been produced in the spectro-temporal domain. Implications of these observations for the modelling of syllables in the fields of speech perception and phonology are discussed.Les approches récentes de la syllabe réintroduisent une description continue et descriptible mathématiquement des objets sonores: les courbes. Les recherches psycholinguistiques sur la perception du langage parlé ont plutôt recours à des descriptions symboliques et hautement hiérarchisées de la syllabe dans le cadre desquelles segments (phones) et syllabes sont strictement différenciés. Des travaux récents sur les fondements auditifs de la perception de la parole mettent en évidence la capacité qu'ont les locuteurs à extraire une information phonétique alors même que des dégradations majeures du signal sont effectuées dans le domaine spectro-temporel. Les implications de ces observations pour la conception de la syllabe dans le champ de la perception de la parole et en phonologie sont discutées
Windows into Sensory Integration and Rates in Language Processing: Insights from Signed and Spoken Languages
This dissertation explores the hypothesis that language processing proceeds in "windows" that correspond to representational units, where sensory signals are integrated according to time-scales that correspond to the rate of the input. To investigate universal mechanisms, a comparison of signed and spoken languages is necessary. Underlying the seemingly effortless process of language comprehension is the perceiver's knowledge about the rate at which linguistic form and meaning unfold in time and the ability to adapt to variations in the input.
The vast body of work in this area has focused on speech perception, where the goal is to determine how linguistic information is recovered from acoustic signals. Testing some of these theories in the visual processing of American Sign Language (ASL) provides a unique opportunity to better understand how sign languages are processed and which aspects of speech perception models are in fact about language perception across modalities.
The first part of the dissertation presents three psychophysical experiments investigating temporal integration windows in sign language perception by testing the intelligibility of locally time-reversed sentences. The findings demonstrate the contribution of modality for the time-scales of these windows, where signing is successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 ms), while also pointing to modality-independent mechanisms, where integration occurs in durations that correspond to the size of linguistic units. The second part of the dissertation focuses on production rates in sentences taken from natural conversations of English, Korean, and ASL. Data from word, sign, morpheme, and syllable rates suggest that while the rate of words and signs can vary from language to language, the relationship between the rate of syllables and morphemes is relatively consistent among these typologically diverse languages. The results from rates in ASL also complement the findings in perception experiments by confirming that time-scales at which phonological units fluctuate in production match the temporal integration windows in perception.
These results are consistent with the hypothesis that there are modality-independent time pressures for language processing, and discussions provide a synthesis of converging findings from other domains of research and propose ideas for future investigations
Stimulus and cognitive factors in cortical entrainment to speech
Understanding speech is a difficult computational problem yet the human brain does it with ease. Entrainment of oscillatory neural activity to acoustic features of speech is an example of dynamic coupling between cortical activity and sensory inputs. The phenomenon may be a bottom-up, sensory-driven neurophysiological mechanism that supports speech processing. However, cognitive top-down factors such as linguistic knowledge and attentional focus affect speech perception, especially in challenging real-world environments. It is unclear how these top-down influences affect cortical entrainment to speech. We used electroencephalography to measure cortical entrainment to speech under conditions of acoustic and cognitive interference. By manipulating the bottom-up, sensory features in the acoustic scene we found evidence of top-down influences of attentional selection and linguistic processing on speech-entrained activity
Does seeing an Asian face make speech sound more accented?
Published online: 17 May 2017Prior studies have reported that seeing an Asian
face makes American English sound more accented. The current
study investigates whether this effect is perceptual, or if it
instead occurs at a later decision stage. We first replicated the
finding that showing static Asian and Caucasian faces can
shift people’s reports about the accentedness of speech accompanying
the pictures. When we changed the static pictures to
dubbed videos, reducing the demand characteristics, the shift
in reported accentedness largely disappeared. By including
unambiguous items along with the original ambiguous items,
we introduced a contrast bias and actually reversed the shift,
with the Asian-face videos yielding lower judgments of
accentedness than the Caucasian-face videos. By changing
to a mixed rather than blocked design, so that the ethnicity
of the videos varied from trial to trial, we eliminated the difference
in accentedness rating. Finally, we tested participants’
perception of accented speech using the selective adaptation
paradigm. After establishing that an auditory-only accented
adaptor shifted the perception of how accented test words
are, we found that no such adaptation effect occurred when
the adapting sounds relied on visual information (Asian vs.
Caucasian videos) to influence the accentedness of an ambiguous
auditory adaptor. Collectively, the results demonstrate
that visual information can affect the interpretation, but not
the perception, of accented speech.Support was provided by Ministerio de Ciencia E Innovacion, Grant
PSI2014-53277, Centro de Excelencia Severo Ochoa, Grant SEV-2015-
0490, and by the National Science Foundation under Grant IBSS-1519908
Prosodic temporal alignment of co-speech gestures to speech facilitates referent resolution
Using a referent detection paradigm, we examined whether listeners can determine the object speakers are referring to by using the temporal alignment between the motion speakers impose on objects and their labeling utterances. Stimuli were created by videotaping speakers labeling a novel creature. Without being explicitly instructed to do so, speakers moved the creature during labeling. Trajectories of these motions were used to animate photographs of the creature. Participants in subsequent perception studies heard these labeling utterances while seeing side-by-side animations of two identical creatures in which only the target creature moved as originally intended by the speaker. Using the cross-modal temporal relationship between speech and referent motion, participants identified which creature the speaker was labeling, even when the labeling utterances were low-pass filtered to remove their semantic content or replaced by tone analogues. However, when the prosodic structure was eliminated by reversing the speech signal, participants no longer detected the referent as readily. These results provide strong support for a prosodic cross-modal alignment hypothesis. Speakers produce a perceptible link between the motion they impose upon a referent and the prosodic structure of their speech, and listeners readily use this prosodic cross-modal relationship to resolve referential ambiguity in word-learning situations
Recommended from our members
Listening under pressure : the downside of motivation
The desire for self-improvement is critical to human performance and learning outcomes. Paradoxically, however, being subjected to increased performance pressure can also result in “choking under pressure”. No studies have experimentally examined the extent to which motivation impacts native speech processing. This dissertation manipulated performance pressure in listeners, and systematically examined its impact on three speech-processing experiments. Sixty adult native English listeners and 45 non-native listeners with poorer English proficiency completed three speech processing experiments, twice – once to establish a baseline, and again to measure changes in performance. In these experiments using native English speech, listeners detected (illusionary) sound changes, categorized phonemes under lexical interference, and recognized words in noises. After baseline testing, half of the participants in each language group were instructed to work, with a fictitious partner, towards a performance-contingent monetary reward; the other half, as controls, simply performed the tasks a second time. This study demonstrated a negative impact of performance pressure on native listeners in all experiments. Relative to the controls, the motivation group were more susceptible to illusions, failed to ignore lexical interference despite prior exposure, and recognized fewer words in cognitively-demanding listening situations. Unexpectedly, relative to native listeners, non-native listeners perceived it as less important to perform well, and those who were in the high performance-pressure group requested significantly greater amount of money for improvement. These language-group differences in task-related attitudes might be a confounding factor that moderate the effect of motivation. By illustrating a complex interaction among motivation, listener status, and performance-induced demands, this dissertation highlights the importance of motivation in speech science.Communication Sciences and Disorder
- …