Fooling deep neural networks with adversarial input have exposed a
significant vulnerability in the current state-of-the-art systems in multiple
domains. Both black-box and white-box approaches have been used to either
replicate the model itself or to craft examples which cause the model to fail.
In this work, we propose a framework which uses multi-objective evolutionary
optimization to perform both targeted and un-targeted black-box attacks on
Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR
systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER)
of these systems by upto 980%, indicating the potency of our approach. During
both un-targeted and targeted attacks, the adversarial samples maintain a high
acoustic similarity of 0.98 and 0.97 with the original audio.Comment: Published in Interspeech 201