4,208 research outputs found
An Effective Label Noise Model for DNN Text Classification
Because large, human-annotated datasets suffer from labeling errors, it is
crucial to be able to train deep neural networks in the presence of label
noise. While training image classification models with label noise have
received much attention, training text classification models have not. In this
paper, we propose an approach to training deep networks that is robust to label
noise. This approach introduces a non-linear processing layer (noise model)
that models the statistics of the label noise into a convolutional neural
network (CNN) architecture. The noise model and the CNN weights are learned
jointly from noisy training data, which prevents the model from overfitting to
erroneous labels. Through extensive experiments on several text classification
datasets, we show that this approach enables the CNN to learn better sentence
representations and is robust even to extreme label noise. We find that proper
initialization and regularization of this noise model is critical. Further, by
contrast to results focusing on large batch sizes for mitigating label noise
for image classification, we find that altering the batch size does not have
much effect on classification performance.Comment: Accepted at NAACL-HLT 2019 Main Conference Long pape
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
Environmental audio tagging aims to predict only the presence or absence of
certain acoustic events in the interested acoustic scene. In this paper we make
contributions to audio tagging in two parts, respectively, acoustic modeling
and feature learning. We propose to use a shrinking deep neural network (DNN)
framework incorporating unsupervised feature learning to handle the multi-label
classification task. For the acoustic modeling, a large set of contextual
frames of the chunk are fed into the DNN to perform a multi-label
classification for the expected tags, considering that only chunk (or
utterance) level rather than frame-level labels are available. Dropout and
background noise aware training are also adopted to improve the generalization
capability of the DNNs. For the unsupervised feature learning, we propose to
use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to
generate new data-driven features from the Mel-Filter Banks (MFBs) features.
The new features, which are smoothed against background noise and more compact
with contextual information, can further improve the performance of the DNN
baseline. Compared with the standard Gaussian Mixture Model (GMM) baseline of
the DCASE 2016 audio tagging challenge, our proposed method obtains a
significant equal error rate (EER) reduction from 0.21 to 0.13 on the
development set. The proposed aDAE system can get a relative 6.7% EER reduction
compared with the strong DNN baseline on the development set. Finally, the
results also show that our approach obtains the state-of-the-art performance
with 0.15 EER on the evaluation set of the DCASE 2016 audio tagging task while
EER of the first prize of this challenge is 0.17.Comment: 10 pages, dcase 2016 challeng
On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks
Mixup~\cite{zhang2017mixup} is a recently proposed method for training deep
neural networks where additional samples are generated during training by
convexly combining random pairs of images and their associated labels. While
simple to implement, it has been shown to be a surprisingly effective method of
data augmentation for image classification: DNNs trained with mixup show
noticeable gains in classification performance on a number of image
classification benchmarks. In this work, we discuss a hitherto untouched aspect
of mixup training -- the calibration and predictive uncertainty of models
trained with mixup. We find that DNNs trained with mixup are significantly
better calibrated -- i.e., the predicted softmax scores are much better
indicators of the actual likelihood of a correct prediction -- than DNNs
trained in the regular fashion. We conduct experiments on a number of image
classification architectures and datasets -- including large-scale datasets
like ImageNet -- and find this to be the case. Additionally, we find that
merely mixing features does not result in the same calibration benefit and that
the label smoothing in mixup training plays a significant role in improving
calibration. Finally, we also observe that mixup-trained DNNs are less prone to
over-confident predictions on out-of-distribution and random-noise data. We
conclude that the typical overconfidence seen in neural networks, even on
in-distribution data is likely a consequence of training with hard labels,
suggesting that mixup be employed for classification tasks where predictive
uncertainty is a significant concern.Comment: NeurIPS 201
Acoustic scene classification using teacher-student learning with soft-labels
Acoustic scene classification identifies an input segment into one of the
pre-defined classes using spectral information. The spectral information of
acoustic scenes may not be mutually exclusive due to common acoustic properties
across different classes, such as babble noises included in both airports and
shopping malls. However, conventional training procedure based on one-hot
labels does not consider the similarities between different acoustic scenes. We
exploit teacher-student learning with the purpose to derive soft-labels that
consider common acoustic properties among different acoustic scenes. In
teacher-student learning, the teacher network produces soft-labels, based on
which the student network is trained. We investigate various methods to extract
soft-labels that better represent similarities across different scenes. Such
attempts include extracting soft-labels from multiple audio segments that are
defined as an identical acoustic scene. Experimental results demonstrate the
potential of our approach, showing a classification accuracy of 77.36 % on the
DCASE 2018 task 1 validation set.Comment: Accepted for presentation at Interspeech 201
Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems
As the popularity of voice user interface (VUI) exploded in recent years,
speaker recognition system has emerged as an important medium of identifying a
speaker in many security-required applications and services. In this paper, we
propose the first real-time, universal, and robust adversarial attack against
the state-of-the-art deep neural network (DNN) based speaker recognition
system. Through adding an audio-agnostic universal perturbation on arbitrary
enrolled speaker's voice input, the DNN-based speaker recognition system would
identify the speaker as any target (i.e., adversary-desired) speaker label. In
addition, we improve the robustness of our attack by modeling the sound
distortions caused by the physical over-the-air propagation through estimating
room impulse response (RIR). Experiment using a public dataset of 109 English
speakers demonstrates the effectiveness and robustness of our proposed attack
with a high attack success rate of over 90%. The attack launching time also
achieves a 100X speedup over contemporary non-universal attacks.Comment: Published as a conference paper at ICASSP 202
M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding
Deep learning is at the core of recent spoken language understanding (SLU)
related tasks. More precisely, deep neural networks (DNNs) drastically
increased the performances of SLU systems, and numerous architectures have been
proposed. In the real-life context of theme identification of telephone
conversations, it is common to hold both a human, manual (TRS) and an
automatically transcribed (ASR) versions of the conversations. Nonetheless, and
due to production constraints, only the ASR transcripts are considered to build
automatic classifiers. TRS transcripts are only used to measure the
performances of ASR systems. Moreover, the recent performances in term of
classification accuracy, obtained by DNN related systems are close to the
performances reached by humans, and it becomes difficult to further increase
the performances by only considering the ASR transcripts. This paper proposes
to distillates the TRS knowledge available during the training phase within the
ASR representation, by using a new generative adversarial network called
M2H-GAN to generate a TRS-like version of an ASR document, to improve the theme
identification performances.Comment: Submitted at INTERSPEECH 201
Recent Progresses in Deep Learning based Acoustic Models (Updated)
In this paper, we summarize recent progresses made in deep learning based
acoustic models and the motivation and insights behind the surveyed techniques.
We first discuss acoustic models that can effectively exploit variable-length
contextual information, such as recurrent neural networks (RNNs), convolutional
neural networks (CNNs), and their various combination with other models. We
then describe acoustic models that are optimized end-to-end with emphasis on
feature representations learned jointly with rest of the system, the
connectionist temporal classification (CTC) criterion, and the attention-based
sequence-to-sequence model. We further illustrate robustness issues in speech
recognition systems, and discuss acoustic model adaptation, speech enhancement
and separation, and robust training strategies. We also cover modeling
techniques that lead to more efficient decoding and discuss possible future
directions in acoustic model research.Comment: This is an updated version with latest literature until ICASSP2018 of
the paper: Dong Yu and Jinyu Li, "Recent Progresses in Deep Learning based
Acoustic Models," vol.4, no.3, IEEE/CAA Journal of Automatica Sinica, 201
A Survey on Resilient Machine Learning
Machine learning based system are increasingly being used for sensitive tasks
such as security surveillance, guiding autonomous vehicle, taking investment
decisions, detecting and blocking network intrusion and malware etc. However,
recent research has shown that machine learning models are venerable to attacks
by adversaries at all phases of machine learning (eg, training data collection,
training, operation). All model classes of machine learning systems can be
misled by providing carefully crafted inputs making them wrongly classify
inputs. Maliciously created input samples can affect the learning process of a
ML system by either slowing down the learning process, or affecting the
performance of the learned mode, or causing the system make error(s) only in
attacker's planned scenario. Because of these developments, understanding
security of machine learning algorithms and systems is emerging as an important
research area among computer security and machine learning researchers and
practitioners. We present a survey of this emerging area in machine learning
Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks
There is great potential for damage from adversarial learning (AL) attacks on
machine-learning based systems. In this paper, we provide a contemporary survey
of AL, focused particularly on defenses against attacks on statistical
classifiers. After introducing relevant terminology and the goals and range of
possible knowledge of both attackers and defenders, we survey recent work on
test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE)
attacks and particularly defenses against same. In so doing, we distinguish
robust classification from anomaly detection (AD), unsupervised from
supervised, and statistical hypothesis-based defenses from ones that do not
have an explicit null (no attack) hypothesis; we identify the hyperparameters a
particular method requires, its computational complexity, as well as the
performance measures on which it was evaluated and the obtained quality. We
then dig deeper, providing novel insights that challenge conventional AL wisdom
and that target unresolved issues, including: 1) robust classification versus
AD as a defense strategy; 2) the belief that attack success increases with
attack strength, which ignores susceptibility to AD; 3) small perturbations for
test-time evasion attacks: a fallacy or a requirement?; 4) validity of the
universal assumption that a TTE attacker knows the ground-truth class for the
example to be attacked; 5) black, grey, or white box attacks as the standard
for defense evaluation; 6) susceptibility of query-based RE to an AD defense.
We also discuss attacks on the privacy of training data. We then present
benchmark comparisons of several defenses against TTE, RE, and backdoor DP
attacks on images. The paper concludes with a discussion of future work
Defense of Word-level Adversarial Attacks via Random Substitution Encoding
The adversarial attacks against deep neural networks on computer vision tasks
have spawned many new technologies that help protect models from avoiding false
predictions. Recently, word-level adversarial attacks on deep models of Natural
Language Processing (NLP) tasks have also demonstrated strong power, e.g.,
fooling a sentiment classification neural network to make wrong decisions.
Unfortunately, few previous literatures have discussed the defense of such
word-level synonym substitution based attacks since they are hard to be
perceived and detected. In this paper, we shed light on this problem and
propose a novel defense framework called Random Substitution Encoding (RSE),
which introduces a random substitution encoder into the training process of
original neural networks. Extensive experiments on text classification tasks
demonstrate the effectiveness of our framework on defense of word-level
adversarial attacks, under various base and attack models.Comment: 12 pages, 2 figures, 4 tables. Accepted as a FULL paper at KSEM 202
- …