28 research outputs found
Can spoofing countermeasure and speaker verification systems be jointly optimised?
Spoofing countermeasure (CM) and automatic speaker verification (ASV)
sub-systems can be used in tandem with a backend classifier as a solution to
the spoofing aware speaker verification (SASV) task. The two sub-systems are
typically trained independently to solve different tasks. While our previous
work demonstrated the potential of joint optimisation, it also showed a
tendency to over-fit to speakers and a lack of sub-system complementarity.
Using only a modest quantity of auxiliary data collected from new speakers, we
show that joint optimisation degrades the performance of separate CM and ASV
sub-systems, but that it nonetheless improves complementarity, thereby
delivering superior SASV performance. Using standard SASV evaluation data and
protocols, joint optimisation reduces the equal error rate by 27\% relative to
performance obtained using fixed, independently-optimised sub-systems under
like-for-like training conditions.Comment: Accepted to ICASSP 2023. Code will be available soo
On the potential of jointly-optimised solutions to spoofing attack detection and automatic speaker verification
The spoofing-aware speaker verification (SASV) challenge was designed to
promote the study of jointly-optimised solutions to accomplish the
traditionally separately-optimised tasks of spoofing detection and speaker
verification. Jointly-optimised systems have the potential to operate in
synergy as a better performing solution to the single task of reliable speaker
verification. However, none of the 23 submissions to SASV 2022 are jointly
optimised. We have hence sought to determine why separately-optimised
sub-systems perform best or why joint optimisation was not successful.
Experiments reported in this paper show that joint optimisation is successful
in improving robustness to spoofing but that it degrades speaker verification
performance. The findings suggest that spoofing detection and speaker
verification sub-systems should be optimised jointly in a manner which reflects
the differences in how information provided by each sub-system is complementary
to that provided by the other. Progress will also likely depend upon the
collection of data from a larger number of speakers.Comment: Accepted to IberSPEECH 2022 Conferenc
Spoofing attack augmentation: can differently-trained attack models improve generalisation?
A reliable deepfake detector or spoofing countermeasure (CM) should be robust
in the face of unpredictable spoofing attacks. To encourage the learning of
more generaliseable artefacts, rather than those specific only to known
attacks, CMs are usually exposed to a broad variety of different attacks during
training. Even so, the performance of deep-learning-based CM solutions are
known to vary, sometimes substantially, when they are retrained with different
initialisations, hyper-parameters or training data partitions. We show in this
paper that the potency of spoofing attacks, also deep-learning-based, can
similarly vary according to training conditions, sometimes resulting in
substantial degradations to detection performance. Nevertheless, while a
RawNet2 CM model is vulnerable when only modest adjustments are made to the
attack algorithm, those based upon graph attention networks and self-supervised
learning are reassuringly robust. The focus upon training data generated with
different attack algorithms might not be sufficient on its own to ensure
generaliability; some form of spoofing attack augmentation at the algorithm
level can be complementary.Comment: Accepted to ICASSP 202
Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems
We present Malafide, a universal adversarial attack against automatic speaker
verification (ASV) spoofing countermeasures (CMs). By introducing convolutional
noise using an optimised linear time-invariant filter, Malafide attacks can be
used to compromise CM reliability while preserving other speech attributes such
as quality and the speaker's voice. In contrast to other adversarial attacks
proposed recently, Malafide filters are optimised independently of the input
utterance and duration, are tuned instead to the underlying spoofing attack,
and require the optimisation of only a small number of filter coefficients.
Even so, they degrade CM performance estimates by an order of magnitude, even
in black-box settings, and can also be configured to overcome integrated CM and
ASV subsystems. Integrated solutions that use self-supervised learning CMs,
however, are more robust, under both black-box and white-box settings.Comment: Accepted at INTERSPEECH 202