Search CORE

2 research outputs found

MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

Author: Andronic Iustina
Kürzinger Ludwig
Rigoll Gerhard
Rosas Edgar Ricardo Chavez
Seeber Bernhard U.
Publication venue
Publication date: 25/07/2020
Field of study

Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. Our method is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on uncompressed, as well as MP3-compressed data sets and (2) Signal-to-Noise Ratio (SNR) estimated for both uncompressed and MP3-compressed AAEs that are reconstructed in the time domain by feature inversion. We found that MP3 compression applied to AAEs indeed reduces the CER when compared to uncompressed AAEs. Moreover, feature-inverted (reconstructed) AAEs had significantly higher SNRs after MP3 compression, indicating that AN was reduced. In contrast to AN, MP3 compression applied to utterances augmented with regular noise resulted in more transcription errors, giving further evidence that MP3 encoding is effective in diminishing only AN.Comment: Submitted and accepted at SPECOM 2020 conferenc

arXiv.org e-Print Archive

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Author: Kürzinger Ludwig
Li Lujun
Rigoll Gerhard
Rosas Edgar Ricardo Chavez
Watzel Tobias
Publication venue
Publication date: 21/07/2020
Field of study

Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.Comment: To be published at SPECOM 202

arXiv.org e-Print Archive