The development of privacy-preserving automatic speaker verification systems
has been the focus of a number of studies with the intent of allowing users to
authenticate themselves without risking the privacy of their voice. However,
current privacy-preserving methods assume that the template voice
representations (or speaker embeddings) used for authentication are extracted
locally by the user. This poses two important issues: first, knowledge of the
speaker embedding extraction model may create security and robustness
liabilities for the authentication system, as this knowledge might help
attackers in crafting adversarial examples able to mislead the system; second,
from the point of view of a service provider the speaker embedding extraction
model is arguably one of the most valuable components in the system and, as
such, disclosing it would be highly undesirable. In this work, we show how
speaker embeddings can be extracted while keeping both the speaker's voice and
the service provider's model private, using Secure Multiparty Computation.
Further, we show that it is possible to obtain reasonable trade-offs between
security and computational cost. This work is complementary to those showing
how authentication may be performed privately, and thus can be considered as
another step towards fully private automatic speaker recognition.Comment: Accepted for publication at Interspeech 202