2,331 research outputs found

    Asymmetries in the perception of synthesized speech

    Get PDF
    It was previously observed [1] that the order of presentation of paired stimuli influenced the number of different responses in same-different tasks in speech synthesis evaluation. This paper investigates this phenomenon within the context of cognitive psychology and demonstrates that, as the cognitive psychology literature suggests, there is an effect relating to the prototypicality of the stimulus. Index Terms: speech synthesis, evaluation, perception, Blizzard Challeng

    Asymmetric discrimination of non-speech tonal analogues of vowels

    Full text link
    Published in final edited form as: J Exp Psychol Hum Percept Perform. 2019 February ; 45(2): 285–300. doi:10.1037/xhp0000603.Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences due to the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with non-speech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally-produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with two-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in one or both of these two acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in non-speech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited.Accepted manuscrip

    Disentangling the effects of phonation and articulation: Hemispheric asymmetries in the auditory N1m response of the human brain

    Get PDF
    BACKGROUND: The cortical activity underlying the perception of vowel identity has typically been addressed by manipulating the first and second formant frequency (F1 & F2) of the speech stimuli. These two values, originating from articulation, are already sufficient for the phonetic characterization of vowel category. In the present study, we investigated how the spectral cues caused by articulation are reflected in cortical speech processing when combined with phonation, the other major part of speech production manifested as the fundamental frequency (F0) and its harmonic integer multiples. To study the combined effects of articulation and phonation we presented vowels with either high (/a/) or low (/u/) formant frequencies which were driven by three different types of excitation: a natural periodic pulseform reflecting the vibration of the vocal folds, an aperiodic noise excitation, or a tonal waveform. The auditory N1m response was recorded with whole-head magnetoencephalography (MEG) from ten human subjects in order to resolve whether brain events reflecting articulation and phonation are specific to the left or right hemisphere of the human brain. RESULTS: The N1m responses for the six stimulus types displayed a considerable dynamic range of 115–135 ms, and were elicited faster (~10 ms) by the high-formant /a/ than by the low-formant /u/, indicating an effect of articulation. While excitation type had no effect on the latency of the right-hemispheric N1m, the left-hemispheric N1m elicited by the tonally excited /a/ was some 10 ms earlier than that elicited by the periodic and the aperiodic excitation. The amplitude of the N1m in both hemispheres was systematically stronger to stimulation with natural periodic excitation. Also, stimulus type had a marked (up to 7 mm) effect on the source location of the N1m, with periodic excitation resulting in more anterior sources than aperiodic and tonal excitation. CONCLUSION: The auditory brain areas of the two hemispheres exhibit differential tuning to natural speech signals, observable already in the passive recording condition. The variations in the latency and strength of the auditory N1m response can be traced back to the spectral structure of the stimuli. More specifically, the combined effects of the harmonic comb structure originating from the natural voice excitation caused by the fluctuating vocal folds and the location of the formant frequencies originating from the vocal tract leads to asymmetric behaviour of the left and right hemisphere

    Orienting asymmetries in dogs’ responses to different communicatory components of human speech

    Get PDF
    It is well established that in human speech perception the left hemisphere (LH) of the brain is specialized for processing intelligible phonemic (segmental) content (e.g., [1–3]), whereas the right hemisphere (RH) is more sensitive to pro- sodic (suprasegmental) cues [4, 5]. Despite evidence that a range of mammal species show LH specialization when pro- cessing conspecific vocalizations [6], the presence of hemi- spheric biases in domesticated animals’ responses to the communicative components of human speech has never been investigated. Human speech is familiar and relevant to domestic dogs (Canis familiaris), who are known to perceive both segmental phonemic cues [7–10] and supra- segmental speaker-related [11, 12] and emotional [13] proso- dic cues. Using the head-orienting paradigm, we presented dogs with manipulated speech and tones differing in segmental or suprasegmental content and recorded their orienting responses. We found that dogs showed a sig- nificant LH bias when presented with a familiar spoken command in which the salience of meaningful phonemic (segmental) cues was artificially increased but a significant RH bias in response to commands in which the salience of intonational or speaker-related (suprasegmental) vocal cues was increased. Our results provide insights into mech- anisms of interspecific vocal perception in a domesticated mammal and suggest that dogs may share ancestral or convergent hemispheric specializations for processing the different functional communicative components of speech with human listeners

    Directional asymmetries reveal a universal bias in adult vowel perception

    Get PDF
    published online 21 April 2017Research on cross-language vowel perception in both infants and adults has shown that for many vowel contrasts, discrimination is easier when the same pair of vowels is presented in one direction compared to the reverse direction. According to one account, these directional asymmetries reflect a universal bias favoring “focal” vowels (i.e., vowels whose adjacent formants are close in frequency, which concentrates acoustic energy into a narrower spectral region). An alternative, but not mutually exclusive, account is that such effects reflect an experience-dependent bias favoring prototypical instances of native-language vowel categories. To disentangle the effects of focalization and prototypicality, the authors first identified a certain location in phonetic space where vowels were consistently categorized as /u/ by both Canadian-English and Canadian-French listeners, but that nevertheless varied in their stimulus goodness (i.e., the best Canadian-French /u/ exemplars were more focal compared to the best Canadian-English /u/ exemplars). In subsequent AX discrimination tests, both Canadian-English and Canadian-French listeners performed better at discriminating changes from less to more focal /u/’s compared to the reverse, regardless of variation in prototypicality. These findings demonstrate a universal bias favoring vowels with greater formant convergence that operates independently of biases related to language-specific prototype categorization.This research was supported by NSERC Discovery Grant No. 105397 to L.P. and NSERC Discovery Grant No. 312395 to L.M

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Asymmetry in vowel perception in L1: evidence from articulatory synthesis of an [i]-[e] continuum

    Get PDF
    aussi disponible sur:http://www.geocities.com/ch_karypidis/docs/conferences/Karypidis_et_alii_AISV2005_en.pdfFor the past 25 years, a debate on whether vowel discrimination is affected by stimulus presentation order has been raised and the role of peripheral vowels in our perception has been under careful examination.In earlier studies, the method used to synthesize a vowel continuum has been by fragmenting, in equidistant points, the F1/F2 Euclidean distance between two prototypes (best exemplars of two different vowel categories). Nonetheless, the resulting sounds were rather unrealistic, inasmuch as some of them were assigned formant-value combinations that cannot be produced by a human vocal tract. Furthermore, the assignment of fixed F3 and F4 values generated a false spectral peak (around 3100 Hz and thus close to that of [i]) which induced the listeners to identify more [i]'s than they should have. Evidence from a recent study on vowel prototypes suggests that [i] has a very narrow perception zone, despite its acoustic stability and peripherality and notwithstanding the absence of a mid-close [e] in the system.Bearing these methodological inconsistencies in mind, we opted to prepare our stimuli using articulatory synthesis. Therefore, we have synthesized a prototypic French [i] (stimulus no. 1, the most extreme) and then modified its parameters (jaw height and tongue position), gradually and in 9 steps, towards a prototypic French [e] (stimulus no. 10, the least extreme). We subsequently submitted the 10-vowel continuum to 34 native French listeners by conducting:a) an identification test in which listeners were requested to identify as [i] or [e] seven repetitions of each stimulus, presented in random order;b) a discrimination test in which listeners were presented with 34 stimulus combinations [18 one-step pairs (9 stimulus combinations, 2 orders) and 16 two-step pairs (8 stimulus combinations, 2 orders)] and were asked whether the two vowels were the same or different. The ISI (Inter-Stimulus Interval) was fixed at 250 ms and every pair was presented five times.Results from the identification test reveal a clear quantal perception of the two categories.The discrimination results demonstrate that: a) discrimination is more difficult when a more extreme (on the F2' dimension) stimulus is presented second and b) discrimination is significantly easier in the 2-step condition, in both orders of presentation

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Evaluation of evoked potentials to dyadic tones after cochlear implantation

    Get PDF
    Auditory evoked potentials are tools widely used to assess auditory cortex functions in clinical context. However, in cochlear implant users, electrophysiological measures are challenging due to implant-created artefacts in the EEG. Here, we used independent component analysis to reduce cochlear implant-related artefacts in event-related EEGs of cochlear implant users (n = 12), which allowed detailed spatio-temporal evaluation of auditory evoked potentials by means of dipole source analysis. The present study examined hemispheric asymmetries of auditory evoked potentials to musical sounds in cochlear implant users to evaluate the effect of this type of implantation on neuronal activity. In particular, implant users were presented with two dyadic tonal intervals in an active oddball design and in a passive listening condition. Principally, the results show that independent component analysis is an efficient approach that enables the study of neurophysiological mechanisms of restored auditory function in cochlear implant users. Moreover, our data indicate altered hemispheric asymmetries for dyadic tone processing in implant users compared with listeners with normal hearing (n = 12). We conclude that the evaluation of auditory evoked potentials are of major relevance to understanding auditory cortex function after cochlear implantation and could be of substantial clinical value by indicating the maturation/reorganization of the auditory system after implantatio
    • 

    corecore