3,079 research outputs found
Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training
Developing a practically-robust automatic speech recognition (ASR) is
challenging since the model should not only maintain the original performance
on clean samples, but also achieve consistent efficacy under small volume
perturbations and large domain shifts. To address this problem, we propose a
novel WavAugment Guided Phoneme Adversarial Training (wapat). wapat use
adversarial examples in phoneme space as augmentation to make the model
invariant to minor fluctuations in phoneme representation and preserve the
performance on clean samples. In addition, wapat utilizes the phoneme
representation of augmented samples to guide the generation of adversaries,
which helps to find more stable and diverse gradient-directions, resulting in
improved generalization. Extensive experiments demonstrate the effectiveness of
wapat on End-to-end Speech Challenge Benchmark (ESB). Notably, SpeechLM-wapat
outperforms the original model by 6.28% WER reduction on ESB, achieving the new
state-of-the-art
Learning An Invariant Speech Representation
Recognition of speech, and in particular the ability to generalize and learn
from small sets of labelled examples like humans do, depends on an appropriate
representation of the acoustic input. We formulate the problem of finding
robust speech features for supervised learning with small sample complexity as
a problem of learning representations of the signal that are maximally
invariant to intraclass transformations and deformations. We propose an
extension of a theory for unsupervised learning of invariant visual
representations to the auditory domain and empirically evaluate its validity
for voiced speech sound classification. Our version of the theory requires the
memory-based, unsupervised storage of acoustic templates -- such as specific
phones or words -- together with all the transformations of each that normally
occur. A quasi-invariant representation for a speech segment can be obtained by
projecting it to each template orbit, i.e., the set of transformed signals, and
computing the associated one-dimensional empirical probability distributions.
The computations can be performed by modules of filtering and pooling, and
extended to hierarchical architectures. In this paper, we apply a single-layer,
multicomponent representation for phonemes and demonstrate improved accuracy
and decreased sample complexity for vowel classification compared to standard
spectral, cepstral and perceptual features.Comment: CBMM Memo No. 022, 5 pages, 2 figure
Adversarial Examples in the Physical World: A Survey
Deep neural networks (DNNs) have demonstrated high vulnerability to
adversarial examples. Besides the attacks in the digital world, the practical
implications of adversarial examples in the physical world present significant
challenges and safety concerns. However, current research on physical
adversarial examples (PAEs) lacks a comprehensive understanding of their unique
characteristics, leading to limited significance and understanding. In this
paper, we address this gap by thoroughly examining the characteristics of PAEs
within a practical workflow encompassing training, manufacturing, and
re-sampling processes. By analyzing the links between physical adversarial
attacks, we identify manufacturing and re-sampling as the primary sources of
distinct attributes and particularities in PAEs. Leveraging this knowledge, we
develop a comprehensive analysis and classification framework for PAEs based on
their specific characteristics, covering over 100 studies on physical-world
adversarial examples. Furthermore, we investigate defense strategies against
PAEs and identify open challenges and opportunities for future research. We aim
to provide a fresh, thorough, and systematic understanding of PAEs, thereby
promoting the development of robust adversarial learning and its application in
open-world scenarios.Comment: Adversarial examples, physical-world scenarios, attacks and defense
A Survey on Physical Adversarial Attack in Computer Vision
Over the past decade, deep learning has revolutionized conventional tasks
that rely on hand-craft feature extraction with its strong feature learning
capability, leading to substantial enhancements in traditional tasks. However,
deep neural networks (DNNs) have been demonstrated to be vulnerable to
adversarial examples crafted by malicious tiny noise, which is imperceptible to
human observers but can make DNNs output the wrong result. Existing adversarial
attacks can be categorized into digital and physical adversarial attacks. The
former is designed to pursue strong attack performance in lab environments
while hardly remaining effective when applied to the physical world. In
contrast, the latter focus on developing physical deployable attacks, thus
exhibiting more robustness in complex physical environmental conditions.
Recently, with the increasing deployment of the DNN-based system in the real
world, strengthening the robustness of these systems is an emergency, while
exploring physical adversarial attacks exhaustively is the precondition. To
this end, this paper reviews the evolution of physical adversarial attacks
against DNN-based computer vision tasks, expecting to provide beneficial
information for developing stronger physical adversarial attacks. Specifically,
we first proposed a taxonomy to categorize the current physical adversarial
attacks and grouped them. Then, we discuss the existing physical attacks and
focus on the technique for improving the robustness of physical attacks under
complex physical environmental conditions. Finally, we discuss the issues of
the current physical adversarial attacks to be solved and give promising
directions
- …