14,727 research outputs found
GIRNet: Interleaved Multi-Task Recurrent State Sequence Models
In several natural language tasks, labeled sequences are available in
separate domains (say, languages), but the goal is to label sequences with
mixed domain (such as code-switched text). Or, we may have available models for
labeling whole passages (say, with sentiments), which we would like to exploit
toward better position-specific label inference (say, target-dependent
sentiment annotation). A key characteristic shared across such tasks is that
different positions in a primary instance can benefit from different `experts'
trained from auxiliary data, but labeled primary instances are scarce, and
labeling the best expert for each position entails unacceptable cognitive
burden. We propose GITNet, a unified position-sensitive multi-task recurrent
neural network (RNN) architecture for such applications. Auxiliary and primary
tasks need not share training instances. Auxiliary RNNs are trained over
auxiliary instances. A primary instance is also submitted to each auxiliary
RNN, but their state sequences are gated and merged into a novel composite
state sequence tailored to the primary inference task. Our approach is in sharp
contrast to recent multi-task networks like the cross-stitch and sluice
network, which do not control state transfer at such fine granularity. We
demonstrate the superiority of GIRNet using three applications: sentiment
classification of code-switched passages, part-of-speech tagging of
code-switched text, and target position-sensitive annotation of sentiment in
monolingual passages. In all cases, we establish new state-of-the-art
performance beyond recent competitive baselines.Comment: Accepted at AAAI 201
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Multilingual speech recognition for both monolingual and code-switching
speech is a challenging task. Recently, based on the Mixture of Experts (MoE),
many works have made good progress in multilingual and code-switching ASR, but
present huge computational complexity with the increase of supported languages.
In this work, we propose a computation-efficient network named Language-Routing
Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE
extracts language-specific representations through the Mixture of Language
Experts (MLE), which is guided to learn by a frame-wise language routing
mechanism. The weight-shared frame-level language identification (LID) network
is jointly trained as the shared pre-router of each MoE layer. Experiments show
that the proposed method significantly improves multilingual and code-switching
speech recognition performances over baseline with comparable computational
efficiency.Comment: To appear in Proc. INTERSPEECH 2023, August 20-24, 2023, Dublin,
Irelan
Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
Low-resource automatic speech recognition (ASR) is challenging, as the
low-resource target language data cannot well train an ASR model. To solve this
issue, meta-learning formulates ASR for each source language into many small
ASR tasks and meta-learns a model initialization on all tasks from different
source languages to access fast adaptation on unseen target languages. However,
for different source languages, the quantity and difficulty vary greatly
because of their different data scales and diverse phonological systems, which
leads to task-quantity and task-difficulty imbalance issues and thus a failure
of multilingual meta-learning ASR (MML-ASR). In this work, we solve this
problem by developing a novel adversarial meta sampling (AMS) approach to
improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the
task sampling probability for each source language. Specifically, for each
source language, if the query loss is large, it means that its tasks are not
well sampled to train ASR model in terms of its quantity and difficulty and
thus should be sampled more frequently for extra learning. Inspired by this
fact, we feed the historical task query loss of all source language domain into
a network to learn a task sampling policy for adversarially increasing the
current query loss of MML-ASR. Thus, the learnt task sampling policy can master
the learning situation of each language and thus predicts good task sampling
probability for each language for more effective learning. Finally, experiment
results on two multilingual datasets show significant performance improvement
when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS
to other low-resource speech tasks and transfer learning ASR approaches.Comment: accepted in AAAI202
- …