246 research outputs found

    Vocal caricatures reveal signatures of speaker identity

    Get PDF
    What are the features that impersonators select to elicit a speaker’s identity? We built a voice database of public figures (targets) and imitations produced by professional impersonators. They produced one imitation based on their memory of the target (caricature) and another one after listening to the target audio (replica). A set of naive participants then judged identity and similarity of pairs of voices. Identity was better evoked by the caricatures and replicas were perceived to be closer to the targets in terms of voice similarity. We used this data to map relevant acoustic dimensions for each task. Our results indicate that speaker identity is mainly associated with vocal tract features, while perception of voice similarity is related to vocal folds parameters. We therefore show the way in which acoustic caricatures emphasize identity features at the cost of loosing similarity, which allows drawing an analogy with caricatures in the visual space.Fil: Lopez, Sabrina Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Sistemas Dinámicos; ArgentinaFil: Riera, Pablo Ernesto. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Acustica y Percepción Sonora; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Assaneo, María Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Sistemas Dinámicos; ArgentinaFil: Eguia, Manuel Camilo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología. Laboratorio de Acustica y Percepción Sonora; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Sigman, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Neurociencia Integrativa; Argentina. Universidad Torcuato Di Tella; ArgentinaFil: Trevisan, Marcos Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Física. Laboratorio de Sistemas Dinámicos; Argentin

    Vocal caricatures reveal signatures of speaker identity

    Get PDF
    What are the features that impersonators select to elicit a speaker’s identity? We built a voice database of public figures (targets) and imitations produced by professional impersonators. They produced one imitation based on their memory of the target (caricature) and another one after listening to the target audio (replica). A set of naive participants then judged identity and similarity of pairs of voices. Identity was better evoked by the caricatures and replicas were perceived to be closer to the targets in terms of voice similarity. We used this data to map relevant acoustic dimensions for each task. Our results indicate that speaker identity is mainly associated with vocal tract features, while perception of voice similarity is related to vocal folds parameters.Wetherefore show the way in which acoustic caricatures emphasize identity features at the cost of loosing similarity, which allows drawing an analogy with caricatures in the visual space.Fil: López, Sabrina. Dynamical Systems Lab, IFIBA-Physics dept, University of Buenos Aires, Pabellón 1, Ciudad Universitaria, CABA 1428EGA, ArgentinaFil: Riera, Pablo. Acoustics and Sound Perception Lab, Universidad of Quilmes, Roque Saénz Peña 352, Bernal, Buenos Aires B1876BXD, ArgentinaFil: Assaneo, María Florencia. Dynamical Systems Lab, IFIBA-Physics dept, University of Buenos Aires, Pabellón 1, Ciudad Universitaria, CABA 1428EGA, ArgentinaFil: Eguía, Manuel. Acoustics and Sound Perception Lab, Universidad of Quilmes, Roque Saénz Peña 352, Bernal, Buenos Aires B1876BXD, Argentin

    Souhláskové a samohláskové rozdíly v české angličtině s potlačenou a zvýrazněnou cizostí

    Get PDF
    Cílem této práce je identifikovat konkrétní rysy českého přízvuku v angličtině, které jsou nejprominentnější v percepci českého posluchače a které mohou působit rušivě v komunikační situaci. Úvodní kapitola si klade za cíl přiblížit čtenáři problematiku cizineckého přízvuku, stručně shrnuje současný stav výzkumu v této oblasti a představuje řadu empirických studií. Ve výzkumné části práce jsou analyzovány jednotlivé realizace vybraných hlásek /θ, ð, ŋ, r, w, æ, ɜː/ a následná interpretace výsledků přináší odpovídající závěry. Analyzováno bylo celkem 3568 hláskových realizací od 9 mužských a 19 ženských mluvčích. Od každého z respondentů byly pořízeny dvě nahrávky čteného textu, přičemž první byla v britském modu a ve druhé se mluvčí snažil o napodobení českého cizineckého přízvuku. Každá z 3568 realizací byla samostatně ohodnocena a poté byly srovnány oba mody pro každého mluvčího. Z výsledků vyplývá, že nejčastějším ukazatelem českého přízvuku bylo v této studii /r/, zatímco /θ, ð, ŋ/ byly často hodnoceny stejně v obou modech. Před ustanovením konečných závěrů je však potřeba vzít v úvahu další faktory, kterými jsou například jazykové dovednosti mluvčích anebo počty realizací jednotlivých hlásek. Klíčová slova souhláska, samohláska, cizinecký přízvuk, česká angličtinaThe objective of this thesis is to identify those features of the Czech accent in English that are the most salient in the perception of the Czech listener and that may disturb the communication process. The purpose of the introductory chapter is to familiarize the reader with the subject of the foreign accent, to provide a brief summary of the current state of research and to introduce a series of empirical studies. The research part of the thesis analyzes the individual realizations of the selected speech sounds /θ, ð, ŋ, r, w, æ, ɜː/ and ventures to draw meaningful conclusions from the results. The material analyzed consists of a total of 3568 speech sound tokens, recorded by 9 male and 19 female speakers. Each respondent produced two recordings, one in the British standard mode and another where the speaker imitated the Czech foreign accent. The 3568 tokens were individually rated and the two modes were then compared for each speaker. The results showed /r/ to be favoured by the largest number of speakers as an indicator of the Czech accent, while /θ, ð, ŋ/ often had the same rating in both modes. However, additional factors such as speaker proficiency and number of tokens from individual speech sounds must be taken into consideration before any final conclusions can be drawn from the raw data....Institute of PhoneticsFonetický ústavFilozofická fakultaFaculty of Art

    Voice Mimicry Attacks Assisted by Automatic Speaker Verification

    Get PDF
    International audienceIn this work, we simulate a scenario, where a publicly available ASV system is used to enhance mimicry attacks against another closed source ASV system. In specific, ASV technology is used to perform a similarity search between the voices of recruited attackers (6) and potential target speakers (7,365) from VoxCeleb corpora to find the closest targets for each of the attackers. In addition, we consider 'median', 'furthest', and 'common' targets to serve as a reference points. Our goal is to gain insights how well similarity rankings transfer from the attacker's ASV system to the attacked ASV system, whether the attackers are able to improve their attacks by mimicking, and how the properties of the voices of attackers change due to mimicking. We address these questions through ASV experiments, listening tests, and prosodic and formant analyses. For the ASV experiments, we use i-vector technology in the attacker side, and x-vectors in the attacked side. For the listening tests, we recruit listeners through crowdsourcing. The results of the ASV experiments indicate that the speaker similarity scores transfer well from one ASV system to another. Both the ASV experiments and the listening tests reveal that the mimicry attempts do not, in general, help in bringing attacker's scores closer to the target's. A detailed analysis shows that mimicking does not improve attacks, when the natural voices of attackers and targets are similar to each other. The analysis of prosody and formants suggests that the attackers were able to considerably change their speaking rates when mimicking, but the changes in F0 and formants were modest. Overall, the results suggest that untrained impersonators do not pose a high threat towards ASV systems, but the use of ASV systems to attack other ASV systems is a potential threat.

    Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection

    Get PDF
    (A slightly shorter version) has been submitted to IEEE ICASSP 2019We consider technology-assisted mimicry attacks in the context of automatic speaker verification (ASV). We use ASV itself to select targeted speakers to be attacked by human-based mimicry. We recorded 6 naive mimics for whom we select target celebrities from VoxCeleb1 and VoxCeleb2 corpora (7,365 potential targets) using an i-vector system. The attacker attempts to mimic the selected target, with the utterances subjected to ASV tests using an independently developed x-vector system. Our main finding is negative: even if some of the attacker scores against the target speakers were slightly increased, our mimics did not succeed in spoofing the x-vector system. Interestingly, however, the relative ordering of the selected targets (closest, furthest, median) are consistent between the systems, which suggests some level of transferability between the system

    Imitation, Awareness; and Folk Linguistic Artifacts

    Get PDF
    Imitations are sophisticated performances displaying regular patterns. The study of imitation allows linguiSts to understand speakers' perceptions of sociolinguistic variation. In this dissertation, I analyze imitations of non-native accents in order to answer two questions: what can imitation reveal about perception, and how are folk linguistic artifacts (Preston 1996) involved in imitation? These questions are approached from the framework offolk linguistic awareness (Preston 1996). By redefining the concept of salience according to the modes of folk linguistic awareness, I am able to more precisely consider how imitation reflects salience. I address both of these questions by eliciting imitations from speakers in which folk artifacts are present. For my investigation, twenty speakers read a short passage in English. Ten were non-native speakers of American English (NNAE) and ten were native speakers of American English (AE). The AE speakers were recorded reading the passage in their regular voice and with two types of imitated accents: free imitations, which were spontaneously produced, and modeled imitations, which were produced directly after hearing the NNAE speakers. Free imitations revealed folk linguistic artifacts, while modeled imitations were more reflective of the immediate target. Participants listened to the authentic and imitated accents and were asked to determine the accent and authenticity of each speaker. I found that there was not a significant difference in the pitch and vowels between free and modeled AE imitations, which indicated that these aspects of imitations are largely based on folk linguistic artifacts. Listeners were able to determine which voices were authentic and which were imitated. Listeners were also able to identify the speakers' accents, perhaps aided by the folk artifact status of these particular accents. Listeners were better at identifying the accents of free imitations than modeled imitations, which suggested that listeners prefer imitations that are solely based on folk artifacts. Overall, I found that imitation is a valuable tool for the analysis of speech perception. The modes of folk linguistic awareness are useful in interpreting imitations and understanding salience. This research shows that folk linguistic artifacts are the foundation of imitations and an important tool in perceptual categorization

    Analýza anonymizačních strategií v angličtině

    Get PDF
    Cílem této práce je identifikovat konkrétní rysy psaného projevu, vychází z analýzy napsaných 10 autory, z . poskytnuty dva dopisy popisující stejnou situaci jeden í, a druhý anonymním dopisem. litativní stylistic kterou dopl krátká prezentace kvantitativních metod. Úvodní kapitola si klade za cíl forenzní analýzy autorství ických studií. e je utím kvantitativní analýzy, a . slovní zásoby, zatímco neupravoval pravopis, a pouze dva v anonymním textu interpunkci. kterými jsou vel , nebo , motivace . anonymiza ní strategie, forenzní lingvistika, ur ení autorství, idiolektThe objective of this thesis is to identify those specific aspects of written style which native speakers of English modify when attempting to anonymize their texts. The conclusions are based on the analysis of 20 texts by 10 authors, all of whom are native speakers of English. Two texts dealing with the same topic were produced by each participant; one was written as an official letter of complaint, and the other was written as an anonymous letter. The bulk of the results are grounded on a qualitative stylistic analysis of the individual texts, with merely a brief survey of quantitative methods.The purpose of the introductory chapter is to familiarize the reader with the subject of forensic authorship analysis, to provide a brief summary of the current state of research, and to introduce a series of empirical studies. The practical part of the thesis presents the qualitative stylistic analysis, provides a shorter summary of the quantitative analysis, and finally ventures to draw meaningful conclusions from the results. The results showed that the majority of authors manipulated with the style/register of the texts and with the specific lexical choices, whereas none of the 10 authors made alterations to spelling and only 2 authors chose to change the punctuation in the anonymous text. However,...Department of the English Language and ELT MethodologyÚstav anglického jazyka a didaktikyFilozofická fakultaFaculty of Art

    Speakers are more cooperative and less individual when interacting in larger group sizes

    Full text link
    Introduction: Cooperation, acoustically signaled through vocal convergence, is facilitated when group members are more similar. Excessive vocal convergence may, however, weaken individual recognizability. This study aimed to explore whether constraints to convergence can arise in circumstances where interlocutors need to enhance their vocal individuality. Therefore, we tested the effects of group size (3 and 5 interactants) on vocal convergence and individualization in a social communication scenario in which individual recognition by voice is at stake. Methods: In an interactive game, players had to recognize each other through their voices while solving a cooperative task online. The vocal similarity was quantified through similarities in speaker i-vectors obtained through probabilistic linear discriminant analysis (PLDA). Speaker recognition performance was measured through the system Equal Error Rate (EER). Results: Vocal similarity between-speakers increased with a larger group size which indicates a higher cooperative vocal behavior. At the same time, there wasan increase in EER for the same speakers between the smaller and the largergroup size, meaning a decrease in overall recognition performance. Discussion: The decrease in vocal individualization in the larger group size suggests thatingroup cooperation and social cohesion conveyed through acoustic convergence have priority over individualization in larger groups of unacquainted speakers
    corecore