4 research outputs found
Weighted-Sampling Audio Adversarial Example Attack
Recent studies have highlighted audio adversarial examples as a ubiquitous
threat to state-of-the-art automatic speech recognition systems. Thorough
studies on how to effectively generate adversarial examples are essential to
prevent potential attacks. Despite many research on this, the efficiency and
the robustness of existing works are not yet satisfactory. In this paper, we
propose~\textit{weighted-sampling audio adversarial examples}, focusing on the
numbers and the weights of distortion to reinforce the attack. Further, we
apply a denoising method in the loss function to make the adversarial attack
more imperceptible. Experiments show that our method is the first in the field
to generate audio adversarial examples with low noise and high audio robustness
at the minute time-consuming level.Comment: https://aaai.org/Papers/AAAI/2020GB/AAAI-LiuXL.9260.pd
Phonemic Adversarial Attack against Audio Recognition in Real World
Recently, adversarial attacks for audio recognition have attracted much
attention. However, most of the existing studies mainly rely on the
coarse-grain audio features at the instance level to generate adversarial
noises, which leads to expensive generation time costs and weak universal
attacking ability. Motivated by the observations that all audio speech consists
of fundamental phonemes, this paper proposes a phonemic adversarial tack (PAT)
paradigm, which attacks the fine-grain audio features at the phoneme level
commonly shared across audio instances, to generate phonemic adversarial
noises, enjoying the more general attacking ability with fast generation speed.
Specifically, for accelerating the generation, a phoneme density balanced
sampling strategy is introduced to sample quantity less but phonemic features
abundant audio instances as the training data via estimating the phoneme
density, which substantially alleviates the heavy dependency on the large
training dataset. Moreover, for promoting universal attacking ability, the
phonemic noise is optimized in an asynchronous way with a sliding window, which
enhances the phoneme diversity and thus well captures the critical fundamental
phonemic patterns. By conducting extensive experiments, we comprehensively
investigate the proposed PAT framework and demonstrate that it outperforms the
SOTA baselines by large margins (i.e., at least 11X speed up and 78% attacking
ability improvement)
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
In contemporary society, voice-controlled devices, such as smartphones and
home assistants, have become pervasive due to their advanced capabilities and
functionality. The always-on nature of their microphones offers users the
convenience of readily accessing these devices. However, recent research and
events have revealed that such voice-controlled devices are prone to various
forms of malicious attacks, hence making it a growing concern for both users
and researchers to safeguard against such attacks. Despite the numerous studies
that have investigated adversarial attacks and privacy preservation for images,
a conclusive study of this nature has not been conducted for the audio domain.
Therefore, this paper aims to examine existing approaches for
privacy-preserving and privacy-attacking strategies for audio and speech. To
achieve this goal, we classify the attack and defense scenarios into several
categories and provide detailed analysis of each approach. We also interpret
the dissimilarities between the various approaches, highlight their
contributions, and examine their limitations. Our investigation reveals that
voice-controlled devices based on neural networks are inherently susceptible to
specific types of attacks. Although it is possible to enhance the robustness of
such models to certain forms of attack, more sophisticated approaches are
required to comprehensively safeguard user privacy