380 research outputs found
Use of the harmonic phase in synthetic speech detection
Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)
Use of the harmonic phase in synthetic speech detection
Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
Audio has become an increasingly crucial biometric modality due to its
ability to provide an intuitive way for humans to interact with machines. It is
currently being used for a range of applications, including person
authentication to banking to virtual assistants. Research has shown that these
systems are also susceptible to spoofing and attacks. Therefore, protecting
audio processing systems against fraudulent activities, such as identity theft,
financial fraud, and spreading misinformation, is of paramount importance. This
paper reviews the current state-of-the-art techniques for detecting audio
spoofing and discusses the current challenges along with open research
problems. The paper further highlights the importance of considering the
ethical and privacy implications of audio spoofing detection systems. Lastly,
the work aims to accentuate the need for building more robust and generalizable
methods, the integration of automatic speaker verification and countermeasure
systems, and better evaluation protocols.Comment: Accepted in IJCAI 202
DNN Filter Bank Cepstral Coefficients for Spoofing Detection
With the development of speech synthesis techniques, automatic speaker
verification systems face the serious challenge of spoofing attack. In order to
improve the reliability of speaker verification systems, we develop a new
filter bank based cepstral feature, deep neural network filter bank cepstral
coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The
deep neural network filter bank is automatically generated by training a filter
bank neural network (FBNN) using natural and synthetic speech. By adding
restrictions on the training rules, the learned weight matrix of FBNN is
band-limited and sorted by frequency, similar to the normal filter bank. Unlike
the manually designed filter bank, the learned filter bank has different filter
shapes in different channels, which can capture the differences between natural
and synthetic speech more effectively. The experimental results on the ASVspoof
{2015} database show that the Gaussian mixture model maximum-likelihood
(GMM-ML) classifier trained by the new feature performs better than the
state-of-the-art linear frequency cepstral coefficients (LFCC) based
classifier, especially on detecting unknown attacks
Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms
Robust audio anti-spoofing has been increasingly challenging due to the
recent advancements on deepfake techniques. While spectrograms have
demonstrated their capability for anti-spoofing, complementary information
presented in multi-order spectral patterns have not been well explored, which
limits their effectiveness for varying spoofing attacks. Therefore, we propose
a novel deep learning method with a spectral fusion-reconstruction strategy,
namely S2pecNet, to utilise multi-order spectral patterns for robust audio
anti-spoofing representations. Specifically, spectral patterns up to
second-order are fused in a coarse-to-fine manner and two branches are designed
for the fine-level fusion from the spectral and temporal contexts. A
reconstruction from the fused representation to the input spectrograms further
reduces the potential fused information loss. Our method achieved the
state-of-the-art performance with an EER of 0.77% on a widely used dataset:
ASVspoof2019 LA Challenge
- …