34,513 research outputs found
Universal Adversarial Perturbations for Speech Recognition Systems
In this work, we demonstrate the existence of universal adversarial audio
perturbations that cause mis-transcription of audio signals by automatic speech
recognition (ASR) systems. We propose an algorithm to find a single
quasi-imperceptible perturbation, which when added to any arbitrary speech
signal, will most likely fool the victim speech recognition model. Our
experiments demonstrate the application of our proposed technique by crafting
audio-agnostic universal perturbations for the state-of-the-art ASR system --
Mozilla DeepSpeech. Additionally, we show that such perturbations generalize to
a significant extent across models that are not available during training, by
performing a transferability test on a WaveNet based ASR system.Comment: Published as a conference paper at INTERSPEECH 201
Feature Learning from Spectrograms for Assessment of Personality Traits
Several methods have recently been proposed to analyze speech and
automatically infer the personality of the speaker. These methods often rely on
prosodic and other hand crafted speech processing features extracted with
off-the-shelf toolboxes. To achieve high accuracy, numerous features are
typically extracted using complex and highly parameterized algorithms. In this
paper, a new method based on feature learning and spectrogram analysis is
proposed to simplify the feature extraction process while maintaining a high
level of accuracy. The proposed method learns a dictionary of discriminant
features from patches extracted in the spectrogram representations of training
speech segments. Each speech segment is then encoded using the dictionary, and
the resulting feature set is used to perform classification of personality
traits. Experiments indicate that the proposed method achieves state-of-the-art
results with a significant reduction in complexity when compared to the most
recent reference methods. The number of features, and difficulties linked to
the feature extraction process are greatly reduced as only one type of
descriptors is used, for which the 6 parameters can be tuned automatically. In
contrast, the simplest reference method uses 4 types of descriptors to which 6
functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure
- …