PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
  Split-Second Phoneme Injection

Chen, Bocheng; Guo, Hanqing; Wang, Guangjing; Wang, Yuanda; Xiao, Li; Yan, Qiben

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

Authors: Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Li Xiao
Qiben Yan
Publication date: 13 September 2023
Publisher

Abstract

In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.Comment: RAID 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2309.06960

Last time updated on 08/10/2023