412 research outputs found
Noise Flooding for Detecting Audio Adversarial Examples Against Automatic Speech Recognition
Neural models enjoy widespread use across a variety of tasks and have grown
to become crucial components of many industrial systems. Despite their
effectiveness and extensive popularity, they are not without their exploitable
flaws. Initially applied to computer vision systems, the generation of
adversarial examples is a process in which seemingly imperceptible
perturbations are made to an image, with the purpose of inducing a deep
learning based classifier to misclassify the image. Due to recent trends in
speech processing, this has become a noticeable issue in speech recognition
models. In late 2017, an attack was shown to be quite effective against the
Speech Commands classification model. Limited-vocabulary speech classifiers,
such as the Speech Commands model, are used quite frequently in a variety of
applications, particularly in managing automated attendants in telephony
contexts. As such, adversarial examples produced by this attack could have
real-world consequences. While previous work in defending against these
adversarial examples has investigated using audio preprocessing to reduce or
distort adversarial noise, this work explores the idea of flooding particular
frequency bands of an audio signal with random noise in order to detect
adversarial examples. This technique of flooding, which does not require
retraining or modifying the model, is inspired by work done in computer vision
and builds on the idea that speech classifiers are relatively robust to natural
noise. A combined defense incorporating 5 different frequency bands for
flooding the signal with noise outperformed other existing defenses in the
audio space, detecting adversarial examples with 91.8% precision and 93.5%
recall.Comment: Orally presented at the 18th IEEE International Symposium on Signal
Processing and Information Technology (ISSPIT) in Louisville, Kentucky, USA,
December 2018. 5 pages, 2 figure
Universal Fourier Attack for Time Series
A wide variety of adversarial attacks have been proposed and explored using
image and audio data. These attacks are notoriously easy to generate digitally
when the attacker can directly manipulate the input to a model, but are much
more difficult to implement in the real-world. In this paper we present a
universal, time invariant attack for general time series data such that the
attack has a frequency spectrum primarily composed of the frequencies present
in the original data. The universality of the attack makes it fast and easy to
implement as no computation is required to add it to an input, while time
invariance is useful for real-world deployment. Additionally, the frequency
constraint ensures the attack can withstand filtering. We demonstrate the
effectiveness of the attack in two different domains, speech recognition and
unintended radiated emission, and show that the attack is robust against common
transform-and-compare defense pipelines
Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise
In recent years, significant progress has been made in deep model-based
automatic speech recognition (ASR), leading to its widespread deployment in the
real world. At the same time, adversarial attacks against deep ASR systems are
highly successful. Various methods have been proposed to defend ASR systems
from these attacks. However, existing classification based methods focus on the
design of deep learning models while lacking exploration of domain specific
features. This work leverages filter bank-based features to better capture the
characteristics of attacks for improved detection. Furthermore, the paper
analyses the potentials of using speech and non-speech parts separately in
detecting adversarial attacks. In the end, considering adverse environments
where ASR systems may be deployed, we study the impact of acoustic noise of
various types and signal-to-noise ratios. Extensive experiments show that the
inverse filter bank features generally perform better in both clean and noisy
environments, the detection is effective using either speech or non-speech
part, and the acoustic noise can largely degrade the detection performance
Robustness of Adversarial Attacks in Sound Event Classification
An adversarial attack is a method to generate perturbations to the input of a machine learning model in order to make the output of the model incorrect. The perturbed inputs are known as adversarial examples. In this paper, we investigate the robustness of adversarial examples to simple input transformations such as mp3 compression, resampling, white noise and reverb in the task of sound event classification. By performing this analysis, we aim to provide insight on strengths and weaknesses in current adversarial attack algorithms as well as provide a baseline for defenses against adversarial attacks. Our work shows that adversarial attacks are not robust to simple input transformations. White noise is the most consistent method to defend against adversarial attacks with a success rate of averaged across all models and attack algorithms.23924
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
In contemporary society, voice-controlled devices, such as smartphones and
home assistants, have become pervasive due to their advanced capabilities and
functionality. The always-on nature of their microphones offers users the
convenience of readily accessing these devices. However, recent research and
events have revealed that such voice-controlled devices are prone to various
forms of malicious attacks, hence making it a growing concern for both users
and researchers to safeguard against such attacks. Despite the numerous studies
that have investigated adversarial attacks and privacy preservation for images,
a conclusive study of this nature has not been conducted for the audio domain.
Therefore, this paper aims to examine existing approaches for
privacy-preserving and privacy-attacking strategies for audio and speech. To
achieve this goal, we classify the attack and defense scenarios into several
categories and provide detailed analysis of each approach. We also interpret
the dissimilarities between the various approaches, highlight their
contributions, and examine their limitations. Our investigation reveals that
voice-controlled devices based on neural networks are inherently susceptible to
specific types of attacks. Although it is possible to enhance the robustness of
such models to certain forms of attack, more sophisticated approaches are
required to comprehensively safeguard user privacy
- …