1,193 research outputs found
Design Choices for X-vector Based Speaker Anonymization
The recently proposed x-vector based anonymization scheme converts any input
voice into that of a random pseudo-speaker. In this paper, we present a
flexible pseudo-speaker selection technique as a baseline for the first
VoicePrivacy Challenge. We explore several design choices for the distance
metric between speakers, the region of x-vector space where the pseudo-speaker
is picked, and gender selection. To assess the strength of anonymization
achieved, we consider attackers using an x-vector based speaker verification
system who may use original or anonymized speech for enrollment, depending on
their knowledge of the anonymization scheme. The Equal Error Rate (EER)
achieved by the attackers and the decoding Word Error Rate (WER) over
anonymized data are reported as the measures of privacy and utility.
Experiments are performed using datasets derived from LibriSpeech to find the
optimal combination of design choices in terms of privacy and utility
Language-independent speaker anonymization using orthogonal Householder neural network
Speaker anonymization aims to conceal a speaker's identity while preserving
content information in speech. Current mainstream neural-network speaker
anonymization systems disentangle speech into prosody-related, content, and
speaker representations. The speaker representation is then anonymized by a
selection-based speaker anonymizer that uses a mean vector over a set of
randomly selected speaker vectors from an external pool of English speakers.
However, the resulting anonymized vectors are subject to severe privacy leakage
against powerful attackers, reduction in speaker diversity, and language
mismatch problems for unseen language speaker anonymization. To generate
diverse, language-neutral speaker vectors, this paper proposes an anonymizer
based on an orthogonal Householder neural network (OHNN). Specifically, the
OHNN acts like a rotation to transform the original speaker vectors into
anonymized speaker vectors, which are constrained to follow the distribution
over the original speaker vector space. A basic classification loss is
introduced to ensure that anonymized speaker vectors from different speakers
have unique speaker identities. To further protect speaker identities, an
improved classification loss and similarity loss are used to push
original-anonymized sample pairs away from each other. Experiments on
VoicePrivacy Challenge datasets in English and the AISHELL-3 dataset in
Mandarin demonstrate the proposed anonymizer's effectiveness
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
The growing use of voice user interfaces has led to a surge in the collection
and storage of speech data. While data collection allows for the development of
efficient tools powering most speech services, it also poses serious privacy
issues for users as centralized storage makes private personal speech data
vulnerable to cyber threats. With the increasing use of voice-based digital
assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the
increasing ease with which personal speech data can be collected, the risk of
malicious use of voice-cloning and speaker/gender/pathological/etc. recognition
has increased.
This thesis proposes solutions for anonymizing speech and evaluating the
degree of the anonymization. In this work, anonymization refers to making
personal speech data unlinkable to an identity while maintaining the usefulness
(utility) of the speech signal (e.g., access to linguistic content). We start
by identifying several challenges that evaluation protocols need to consider to
evaluate the degree of privacy protection properly. We clarify how
anonymization systems must be configured for evaluation purposes and highlight
that many practical deployment configurations do not permit privacy evaluation.
Furthermore, we study and examine the most common voice conversion-based
anonymization system and identify its weak points before suggesting new methods
to overcome some limitations. We isolate all components of the anonymization
system to evaluate the degree of speaker PPI associated with each of them.
Then, we propose several transformation methods for each component to reduce as
much as possible speaker PPI while maintaining utility. We promote
anonymization algorithms based on quantization-based transformation as an
alternative to the most-used and well-known noise-based approach. Finally, we
endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy |
for associated source code, see https://github.com/deep-privacy/SA-toolki
Privacy and utility of x-vector based speaker anonymization
International audienceWe study the scenario where individuals (speakers) contribute to the publication of an anonymized speech corpus. Data users then leverage this public corpus to perform downstream tasks (such as training automatic speech recognition systems), while attackers may try to de-anonymize itbased on auxiliary knowledge they collect. Motivated by this scenario, speaker anonymization aims to conceal the speaker identity while preserving the quality and usefulness of speech data. In this paper, we study x-vector based speaker anonymization, the leading approach in the recent Voice Privacy Challenge, which converts an input utterance into that of a random pseudo-speaker. We show that the strength of the anonymization varies significantly depending on how the pseudo-speaker is selected. In particular, we investigate four design choices: the distance measure between speakers, the region of x-vector space where the pseudo-speaker is mapped, the gender selection and whether to use speaker or utterance level assignment. We assess the quality of anonymization from the perspective of the three actors involved in our threat model, namely the speaker, the user and the attacker. To measure privacy and utility, we use respectively the linkability score achieved by the attackers and the decoding word error rate incurred by an ASR model trained with the anonymized data. Experiments on LibriSpeech dataset confirm that the optimal combination ofdesign choices yield state-of-the-art performance in terms of privacy protection as well as utility. Experiments on Mozilla Common Voice dataset show that the best design choices with 50 speakers guarantee the same anonymization level against re-identification attack as raw speech with 20,000 speakers
A comparative study of speech anonymization metrics
International audienceSpeech anonymization techniques have recently been proposed for preserving speakers' privacy. They aim at concealing speak-ers' identities while preserving the spoken content. In this study, we compare three metrics proposed in the literature to assess the level of privacy achieved. We exhibit through simulation the differences and blindspots of some metrics. In addition, we conduct experiments on real data and state-of-the-art anonymiza-tion techniques to study how they behave in a practical scenario. We show that the application-independent log-likelihood-ratio cost function C min llr provides a more robust evaluation of privacy than the equal error rate (EER), and that detection-based metrics provide different information from linkability metrics. Interestingly , the results on real data indicate that current anonymiza-tion design choices do not induce a regime where the differences between those metrics become apparent
A False Sense of Privacy: Towards a Reliable Evaluation Methodology for the Anonymization of Biometric Data
Biometric data contains distinctive human traits such as facial features or gait patterns. The use of biometric data permits an individuation so exact that the data is utilized effectively in identification and authentication systems. But for this same reason, privacy protections become indispensably necessary. Privacy protection is extensively afforded by the technique of anonymization. Anonymization techniques protect sensitive personal data from biometrics by obfuscating or removing information that allows linking records to the generating individuals, to achieve high levels of anonymity. However, our understanding and possibility to develop effective anonymization relies, in equal parts, on the effectiveness of the methods employed to evaluate anonymization performance. In this paper, we assess the state-of-the-art methods used to evaluate the performance of anonymization techniques for facial images and for gait patterns. We demonstrate that the state-of-the-art evaluation methods have serious and frequent shortcomings. In particular, we find that the underlying assumptions of the state-of-the-art are quite unwarranted. State-of-the-art methods generally assume a difficult recognition scenario and thus a weak adversary. However, that assumption causes state-of-the-art evaluations to grossly overestimate the performance of the anonymization. Therefore, we propose a strong adversary which is aware of the anonymization in place. This adversary model implements an appropriate measure of anonymization performance. We improve the selection process for the evaluation dataset, and we reduce the numbers of identities contained in the dataset while ensuring that these identities remain easily distinguishable from one another. Our novel evaluation methodology surpasses the state-of-the-art because we measure worst-case performance and so deliver a highly reliable evaluation of biometric anonymization techniques
Evaluation of Speaker Anonymization on Emotional Speech
International audienceSpeech data carries a range of personal information, such as the speaker's identity and emotional state. These attributes can be used for malicious purposes. With the development of virtual assistants, a new generation of privacy threats has emerged. Current studies have addressed the topic of preserving speech privacy. One of them, the VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology. The task selected for the VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization. The goal is to hide the source speaker's identity while preserving the linguistic information. The baseline of the VPC makes use of a voice conversion. This paper studies the impact of the speaker anonymization baseline system of the VPC on emotional information present in speech utterances. Evaluation is performed following the VPC rules regarding the attackers' knowledge about the anonymization system. Our results show that the VPC baseline system does not suppress speakers' emotions against informed attackers. When comparing anonymized speech to original speech, the emotion recognition performance is degraded by 15% relative to IEMOCAP data, similar to the degradation observed for automatic speech recognition used to evaluate the preservation of the linguistic information
Privacy-Protecting Techniques for Behavioral Data: A Survey
Our behavior (the way we talk, walk, or think) is unique and can be used as a biometric trait. It also correlates with sensitive attributes like emotions. Hence, techniques to protect individuals privacy against unwanted inferences are required. To consolidate knowledge in this area, we systematically reviewed applicable anonymization techniques. We taxonomize and compare existing solutions regarding privacy goals, conceptual operation, advantages, and limitations. Our analysis shows that some behavioral traits (e.g., voice) have received much attention, while others (e.g., eye-gaze, brainwaves) are mostly neglected. We also find that the evaluation methodology of behavioral anonymization techniques can be further improved
- …