31 research outputs found
Introducing the VoicePrivacy initiative
International audienceThe VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this paper, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation. We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results
Design Choices for X-vector Based Speaker Anonymization
The recently proposed x-vector based anonymization scheme converts any input
voice into that of a random pseudo-speaker. In this paper, we present a
flexible pseudo-speaker selection technique as a baseline for the first
VoicePrivacy Challenge. We explore several design choices for the distance
metric between speakers, the region of x-vector space where the pseudo-speaker
is picked, and gender selection. To assess the strength of anonymization
achieved, we consider attackers using an x-vector based speaker verification
system who may use original or anonymized speech for enrollment, depending on
their knowledge of the anonymization scheme. The Equal Error Rate (EER)
achieved by the attackers and the decoding Word Error Rate (WER) over
anonymized data are reported as the measures of privacy and utility.
Experiments are performed using datasets derived from LibriSpeech to find the
optimal combination of design choices in terms of privacy and utility
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling
Speech data on the Internet are proliferating exponentially because of the
emergence of social media, and the sharing of such personal data raises obvious
security and privacy concerns. One solution to mitigate these concerns involves
concealing speaker identities before sharing speech data, also referred to as
speaker anonymization. In our previous work, we have developed an automatic
speaker verification (ASV)-model-free anonymization framework to protect
speaker privacy while preserving speech intelligibility. Although the framework
ranked first place in VoicePrivacy 2022 challenge, the anonymization was
imperfect, since the speaker distinguishability of the anonymized speech was
deteriorated. To address this issue, in this paper, we directly model the
formant distribution and fundamental frequency (F0) to represent speaker
identity and anonymize the source speech by the uniformly scaling formant and
F0. By directly scaling the formant and F0, the speaker distinguishability
degradation of the anonymized speech caused by the introduction of other
speakers is prevented. The experimental results demonstrate that our proposed
framework can improve the speaker distinguishability and significantly
outperforms our previous framework in voice distinctiveness. Furthermore, our
proposed method also can trade off the privacy-utility by using different
scaling factors.Comment: Submitted to ICASSP 202
Deep Learning-based F0 Synthesis for Speaker Anonymization
Voice conversion for speaker anonymization is an emerging concept for privacy
protection. In a deep learning setting, this is achieved by extracting multiple
features from speech, altering the speaker identity, and waveform synthesis.
However, many existing systems do not modify fundamental frequency (F0)
trajectories, which convey prosody information and can reveal speaker identity.
Moreover, mismatch between F0 and other features can degrade speech quality and
intelligibility. In this paper, we formally introduce a method that synthesizes
F0 trajectories from other speech features and evaluate its reconstructional
capabilities. Then we test our approach within a speaker anonymization
framework, comparing it to a baseline and a state-of-the-art F0 modification
that utilizes speaker information. The results show that our method improves
both speaker anonymity, measured by the equal error rate, and utility, measured
by the word error rate.Comment: 5 pages, 4 figures, 6 tables, accepted to EUSIPCO 202
Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline
The use of modern vocoders in an analysis/synthesis pipeline allows us to
investigate high-quality voice conversion that can be used for privacy
purposes. Here, we propose to transform the speaker embedding and the pitch in
order to hide the sex of the speaker. ECAPA-TDNN-based speaker representation
fed into a HiFiGAN vocoder is protected using a neural-discriminant analysis
approach, which is consistent with the zero-evidence concept of privacy. This
approach significantly reduces the information in speech related to the
speaker's sex while preserving speech content and some consistency in the
resulting protected voices.Comment: Accepted to ICASSP 202
New Challenges for Content Privacy in Speech and Audio
Privacy in speech and audio has many facets. A particularly under-developed
area of privacy in this domain involves consideration for information related
to content and context. Speech content can include words and their meaning or
even stylistic markers, pathological speech, intonation patterns, or emotion.
More generally, audio captured in-the-wild may contain background speech or
reveal contextual information such as markers of location, room
characteristics, paralinguistic sounds, or other audible events. Audio
recording devices and speech technologies are becoming increasingly commonplace
in everyday life. At the same time, commercialised speech and audio
technologies do not provide consumers with a range of privacy choices. Even
where privacy is regulated or protected by law, technical solutions to privacy
assurance and enforcement fall short. This position paper introduces three
important and timely research challenges for content privacy in speech and
audio. We highlight current gaps and opportunities, and identify focus areas,
that could have significant implications for developing ethical and safer
speech technologies.Comment: Accepted for publication in ISCA SPSC Symposium 202
Evaluation of Speaker Anonymization on Emotional Speech
International audienceSpeech data carries a range of personal information, such as the speaker's identity and emotional state. These attributes can be used for malicious purposes. With the development of virtual assistants, a new generation of privacy threats has emerged. Current studies have addressed the topic of preserving speech privacy. One of them, the VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology. The task selected for the VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization. The goal is to hide the source speaker's identity while preserving the linguistic information. The baseline of the VPC makes use of a voice conversion. This paper studies the impact of the speaker anonymization baseline system of the VPC on emotional information present in speech utterances. Evaluation is performed following the VPC rules regarding the attackers' knowledge about the anonymization system. Our results show that the VPC baseline system does not suppress speakers' emotions against informed attackers. When comparing anonymized speech to original speech, the emotion recognition performance is degraded by 15% relative to IEMOCAP data, similar to the degradation observed for automatic speech recognition used to evaluate the preservation of the linguistic information