14,071 research outputs found
Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
Low-resource automatic speech recognition (ASR) is challenging, as the
low-resource target language data cannot well train an ASR model. To solve this
issue, meta-learning formulates ASR for each source language into many small
ASR tasks and meta-learns a model initialization on all tasks from different
source languages to access fast adaptation on unseen target languages. However,
for different source languages, the quantity and difficulty vary greatly
because of their different data scales and diverse phonological systems, which
leads to task-quantity and task-difficulty imbalance issues and thus a failure
of multilingual meta-learning ASR (MML-ASR). In this work, we solve this
problem by developing a novel adversarial meta sampling (AMS) approach to
improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the
task sampling probability for each source language. Specifically, for each
source language, if the query loss is large, it means that its tasks are not
well sampled to train ASR model in terms of its quantity and difficulty and
thus should be sampled more frequently for extra learning. Inspired by this
fact, we feed the historical task query loss of all source language domain into
a network to learn a task sampling policy for adversarially increasing the
current query loss of MML-ASR. Thus, the learnt task sampling policy can master
the learning situation of each language and thus predicts good task sampling
probability for each language for more effective learning. Finally, experiment
results on two multilingual datasets show significant performance improvement
when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS
to other low-resource speech tasks and transfer learning ASR approaches.Comment: accepted in AAAI202
Robust Multilingual Part-of-Speech Tagging via Adversarial Training
Adversarial training (AT) is a powerful regularization method for neural
networks, aiming to achieve robustness to input perturbations. Yet, the
specific effects of the robustness obtained from AT are still unclear in the
context of natural language processing. In this paper, we propose and analyze a
neural POS tagging model that exploits AT. In our experiments on the Penn
Treebank WSJ corpus and the Universal Dependencies (UD) dataset (27 languages),
we find that AT not only improves the overall tagging accuracy, but also 1)
prevents over-fitting well in low resource languages and 2) boosts tagging
accuracy for rare / unseen words. We also demonstrate that 3) the improved
tagging performance by AT contributes to the downstream task of dependency
parsing, and that 4) AT helps the model to learn cleaner word representations.
5) The proposed AT model is generally effective in different sequence labeling
tasks. These positive results motivate further use of AT for natural language
tasks.Comment: NAACL 201
Conditional Teacher-Student Learning
The teacher-student (T/S) learning has been shown to be effective for a
variety of problems such as domain adaptation and model compression. One
shortcoming of the T/S learning is that a teacher model, not always perfect,
sporadically produces wrong guidance in form of posterior probabilities that
misleads the student model towards a suboptimal performance. To overcome this
problem, we propose a conditional T/S learning scheme, in which a "smart"
student model selectively chooses to learn from either the teacher model or the
ground truth labels conditioned on whether the teacher can correctly predict
the ground truth. Unlike a naive linear combination of the two knowledge
sources, the conditional learning is exclusively engaged with the teacher model
when the teacher model's prediction is correct, and otherwise backs off to the
ground truth. Thus, the student model is able to learn effectively from the
teacher and even potentially surpass the teacher. We examine the proposed
learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker
adaptation on Microsoft short message dictation dataset. The proposed method
achieves 9.8% and 12.8% relative word error rate reductions, respectively, over
T/S learning for environment adaptation and speaker-independent model for
speaker adaptation.Comment: 5 pages, 1 figure, ICASSP 201
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- …