3 research outputs found
Supervised Classifiers for Audio Impairments with Noisy Labels
Voice-over-Internet-Protocol (VoIP) calls are prone to various speech
impairments due to environmental and network conditions resulting in bad user
experience. A reliable audio impairment classifier helps to identify the cause
for bad audio quality. The user feedback after the call can act as the ground
truth labels for training a supervised classifier on a large audio dataset.
However, the labels are noisy as most of the users lack the expertise to
precisely articulate the impairment in the perceived speech. In this paper, we
analyze the effects of massive noise in labels in training dense networks and
Convolutional Neural Networks (CNN) using engineered features, spectrograms and
raw audio samples as inputs. We demonstrate that CNN can generalize better on
the training data with a large number of noisy labels and gives remarkably
higher test performance. The classifiers were trained both on randomly
generated label noise and the label noise introduced by human errors. We also
show that training with noisy labels requires a significant increase in the
training dataset size, which is in proportion to the amount of noise in the
labels.Comment: To appear in INTERSPEECH 201
DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors
Human subjective evaluation is the gold standard to evaluate speech quality
optimized for human perception. Perceptual objective metrics serve as a proxy
for subjective scores. The conventional and widely used metrics require a
reference clean speech signal, which is unavailable in real recordings. The
no-reference approaches correlate poorly with human ratings and are not widely
adopted in the research community. One of the biggest use cases of these
perceptual objective metrics is to evaluate noise suppression algorithms. This
paper introduces a multi-stage self-teaching based perceptual objective metric
that is designed to evaluate noise suppressors. The proposed method generalizes
well in challenging test conditions with a high correlation to human ratings.Comment: Submitted to ICASSP 202
A Survey of Label-noise Representation Learning: Past, Present and Future
Classical machine learning implicitly assumes that labels of the training
data are sampled from a clean distribution, which can be too restrictive for
real-world scenarios. However, statistical-learning-based methods may not train
deep learning models robustly with these noisy labels. Therefore, it is urgent
to design Label-Noise Representation Learning (LNRL) methods for robustly
training deep models with noisy labels. To fully understand LNRL, we conduct a
survey study. We first clarify a formal definition for LNRL from the
perspective of machine learning. Then, via the lens of learning theory and
empirical study, we figure out why noisy labels affect deep models'
performance. Based on the theoretical guidance, we categorize different LNRL
methods into three directions. Under this unified taxonomy, we provide a
thorough discussion of the pros and cons of different categories. More
importantly, we summarize the essential components of robust LNRL, which can
spark new directions. Lastly, we propose possible research directions within
LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We
also envision potential directions beyond LNRL, such as learning with
feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise
and demonstration-noise.Comment: The draft is kept updating; any comments and suggestions are welcom