327 research outputs found

    Cross-lingual talker discrimination

    Get PDF
    This paper describes a talker discrimination experiment in which native English listeners were presented with two sentences spoken by bilingual talkers (English/German and English/Finnish) and were asked to judge whether they thought the sentences were spoken by the same person or not. Equal amounts of cross-lingual and matched-language trials were presented. The experiments showed that listeners are able to complete this task well, they can discriminate between talkers significantly better than chance. However, listeners are significantly less accurate on cross-lingual talker trials than on matched-language pairs. No significant differences were found on this task between German and Finnish. Bias (B'') and Sensitivity (A') values are presented to analyse the listeners' behaviour in more detail. The results are promising for the evaluation of EMIME, a project covering speech-to-speech translation with speaker adaptation

    Talker discrimination across languages

    Get PDF
    This study investigated the extent to which listeners are able to discriminate between bilingual talkers in three language pairs- English-German, English-Finnish and English-Mandarin. Native English listeners were presented with two sentences spoken by bilingual talkers and were asked to judge whether they thought the sentences were spoken by the same person. Equal amounts of cross-language and matched-language trials were presented. The results show that native English listeners are able to carry out this task well; achieving percent correct levels at well above chance for all three language pairs. Previous research has shown this for English-German, this research shows listeners also extend this to Finnish and Mandarin, languages that are quite distinct from English from a genetic and phonetic similarity perspective. However, listeners are significantly less accurate on cross-language talker trials (English-foreign) than on matched-language trials (English-English and foreign-foreign). Understanding listeners ’ behaviour in cross-language talker discrimination using natural speech is the first step in developing principled evaluation techniques for synthesis systems in which the goal is for the synthesised voice to sound like the original speaker, for instance, in speech-to-speech translation systems, voice conversion and reconstruction. Keywords: human speech perception, talker discrimination, cross-language 1

    Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech

    Get PDF
    This paper describes speaker discrimination experiments in which native English listeners were presented with either natural speech stimuli in English and Mandarin, synthetic speech stimuli in English and Mandarin, or natural Mandarin speech and synthetic English speech stimuli. In each experiment, listeners were asked to decide whether they thought the sentences were spoken by the same person or not. We found that the results for Mandarin/English speaker discrimination are very similar to results found in previous work on German/English and Finnish/English speaker discrimination. We conclude from this and previous work that listeners are able to identify speakers across languages and they are able to identify speakers across speech types, but the combination of these two factors leads to a speaker discrimination task which is too difficult for listeners to perform successfully, given the quality of across-language speaker adapted speech synthesis at present. Index Terms: speaker discrimination, speaker adaptation, HMM-based speech synthesi

    The Zero Resource Speech Challenge 2017

    Full text link
    We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japa

    The effect of music on auditory perception in cochlear-implant users and normal-hearing listeners

    Get PDF

    Perception of linguistic rhythm by newborn infants

    Get PDF
    Previous studies have shown that newborn infants are able to discriminate between certain languages, and it has been suggested that they do so by categorizing varieties of speech rhythm. However, in order to confirm this hypothesis, it is necessary to show that language discrimination is still performed by newborns when all speech cues other than rhythm are removed. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic structure, newborns are still able to discriminate the languages. We conclude that new-borns are able to classify languages according to their type of rhythm, and that this ability may help them bootstrap other phonological properties of their native language

    Rapid Adaptation of Foreign-accented HMM-based Speech Synthesis

    Get PDF
    This paper presents findings of listeners ’ perception of speaker identity in synthetic speech. Specifically, we investigated what the effect is on the perceived identity of a speaker when using differently accented average voice models and limited amounts (five and fifteen sentences) of a speaker’s data to create the synthetic stimuli. A speaker discrimination task was used to measure speaker identity. Native English listeners were presented with natural and synthetic speech stimuli in English and were asked to decide whether they thought the sentences were spoken by the same person or not. An accent rating task was also carried out to measure the perceived accents of the synthetic speech stimuli. The results show that listeners, for the most part, perform as well at speaker discrimination when the stimuli have been created using five or fifteen adaptation sentences as when using 105 sentences. Furthermore, the accent of the average voice model does not affect listeners ’ speaker discrimination performance even though the accent rating task shows listeners are perceiving different accents in the synthetic stimuli. Listeners do not base their speaker similarity decisions on perceived accent. Index Terms: speech synthesis, rapid adaptation 1

    Forensic voice discrimination: the effect of speech type and background noise on performance

    Get PDF
    In forensic settings, lay (non‐expert) listeners may be required to compare voice samples for identity. In two experiments we investigated the effect of background noise and variations in speaking style on performance. In each trial, participants heard two recordings, responded whether the voices belonged to the same person, and provided a confidence rating. In Experiment 1, the first recording featured read speech, while the second featured read or spontaneous speech. Both recordings were presented in quiet, or with background noise. Accuracy was highest when recordings featured the same speaking style. In Experiment 2, background noise either occurred in the first or second recording. Accuracy was higher when it occurred in the second. The overall results reveal that both speaking style and background noise can disrupt accuracy. Whilst there is a relationship between confidence and accuracy in all conditions, it is variable. The forensic implications of these findings are discussed

    The effect of intensive auditory training on auditory skills and on speech intelligibility of prelingual cochlear implanted adolescents and adults

    Get PDF
    AbstractThe aim of the studyTo study the effect of intensive auditory training using the modified version of the Arabic rehabilitation program for adults on both the auditory skills and the degree of speech intelligibility.Materials and methodsThe study was conducted on 30 patients who were divided into two groups according to intensiveness of the auditory training. Each group included 15 patients (10 males and 5 females). Both groups received the usual therapy program provided for cochlear implanted patients. Group (I) received an additional therapy other than the usual form. Minimal Auditory Capabilities Test (MAC Test) was used to assess auditory perception abilities and Speech Intelligibility Rating Scale (SIR) was used to assess speech production skills before implantation and at 3, 6, 12, 18months post-operatively.ResultsA significant difference was found when comparing the two groups in spondee discrimination during the post-operative assessment periods, of 3, 6, 12, and 18months with P value <0.05.A highly significant difference was found for spondee recognition, sentence identification and high context sentence recognition at the 18month assessment with P value <0.01. A significant mean difference with P value <0.05 for speech intelligibility scores at 18months post implantation was found between the two groups.ConclusionThe effectiveness of the modified form of the Arabic Adult rehabilitation was revealed in this study. Using more intensive auditory rehabilitation may result in a better improvement in auditory abilities and speech intelligibility of the prelingually deafened adult cochlear implanted population
    corecore