2 research outputs found
VenoMave: Targeted Poisoning Against Speech Recognition
The wide adoption of Automatic Speech Recognition (ASR) remarkably enhanced
human-machine interaction. Prior research has demonstrated that modern ASR
systems are susceptible to adversarial examples, i.e., malicious audio inputs
that lead to misclassification by the victim's model at run time. The research
question of whether ASR systems are also vulnerable to data-poisoning attacks
is still unanswered. In such an attack, a manipulation happens during the
training phase: an adversary injects malicious inputs into the training set to
compromise the neural network's integrity and performance. Prior work in the
image domain demonstrated several types of data-poisoning attacks, but these
results cannot directly be applied to the audio domain. In this paper, we
present the first data-poisoning attack against ASR, called VenoMave. We
evaluate our attack on an ASR system that detects sequences of digits. When
poisoning only 0.17% of the dataset on average, we achieve an attack success
rate of 86.67%. To demonstrate the practical feasibility of our attack, we also
evaluate if the target audio waveform can be played over the air via simulated
room transmissions. In this more realistic threat model, VenoMave still
maintains a success rate up to 73.33%. We further extend our evaluation to the
Speech Commands corpus and demonstrate the scalability of VenoMave to a larger
vocabulary. During a transcription test with human listeners, we verify that
more than 85% of the original text of poisons can be correctly transcribed. We
conclude that data-poisoning attacks against ASR represent a real threat, and
we are able to perform poisoning for arbitrary target input files while the
crafted poison samples remain inconspicuous