1,122 research outputs found
Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training
We focus on the problem of language modeling for code-switched language, in
the context of automatic speech recognition (ASR). Language modeling for
code-switched language is challenging for (at least) three reasons: (1) lack of
available large-scale code-switched data for training; (2) lack of a replicable
evaluation setup that is ASR directed yet isolates language modeling
performance from the other intricacies of the ASR system; and (3) the reliance
on generative modeling. We tackle these three issues: we propose an
ASR-motivated evaluation setup which is decoupled from an ASR system and the
choice of vocabulary, and provide an evaluation dataset for English-Spanish
code-switching. This setup lends itself to a discriminative training approach,
which we demonstrate to work better than generative language modeling. Finally,
we explore a variety of training protocols and verify the effectiveness of
training with large amounts of monolingual data followed by fine-tuning with
small amounts of code-switched data, for both the generative and discriminative
cases.Comment: EMNLP 201
GIRNet: Interleaved Multi-Task Recurrent State Sequence Models
In several natural language tasks, labeled sequences are available in
separate domains (say, languages), but the goal is to label sequences with
mixed domain (such as code-switched text). Or, we may have available models for
labeling whole passages (say, with sentiments), which we would like to exploit
toward better position-specific label inference (say, target-dependent
sentiment annotation). A key characteristic shared across such tasks is that
different positions in a primary instance can benefit from different `experts'
trained from auxiliary data, but labeled primary instances are scarce, and
labeling the best expert for each position entails unacceptable cognitive
burden. We propose GITNet, a unified position-sensitive multi-task recurrent
neural network (RNN) architecture for such applications. Auxiliary and primary
tasks need not share training instances. Auxiliary RNNs are trained over
auxiliary instances. A primary instance is also submitted to each auxiliary
RNN, but their state sequences are gated and merged into a novel composite
state sequence tailored to the primary inference task. Our approach is in sharp
contrast to recent multi-task networks like the cross-stitch and sluice
network, which do not control state transfer at such fine granularity. We
demonstrate the superiority of GIRNet using three applications: sentiment
classification of code-switched passages, part-of-speech tagging of
code-switched text, and target position-sensitive annotation of sentiment in
monolingual passages. In all cases, we establish new state-of-the-art
performance beyond recent competitive baselines.Comment: Accepted at AAAI 201
Meta-Transfer Learning for Code-Switched Speech Recognition
An increasing number of people in the world today speak a mixed-language as a
result of being multilingual. However, building a speech recognition system for
code-switching remains difficult due to the availability of limited resources
and the expense and significant effort required to collect mixed-language data.
We therefore propose a new learning method, meta-transfer learning, to transfer
learn on a code-switched speech recognition system in a low-resource setting by
judiciously extracting information from high-resource monolingual datasets. Our
model learns to recognize individual languages, and transfer them so as to
better recognize mixed-language speech by conditioning the optimization on the
code-switching data. Based on experimental results, our model outperforms
existing baselines on speech recognition and language modeling tasks, and is
faster to converge.Comment: Accepted in ACL 2020. The first two authors contributed equally to
this wor
Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling
Predicting user intent and detecting the corresponding slots from text are
two key problems in Natural Language Understanding (NLU). In the context of
zero-shot learning, this task is typically approached by either using
representations from pre-trained multilingual transformers such as mBERT, or by
machine translating the source data into the known target language and then
fine-tuning. Our work focuses on a particular scenario where the target
language is unknown during training. To this goal, we propose a novel method to
augment the monolingual source data using multilingual code-switching via
random translations to enhance a transformer's language neutrality when
fine-tuning it for a downstream task. This method also helps discover novel
insights on how code-switching with different language families around the
world impact the performance on the target language. Experiments on the
benchmark dataset of MultiATIS++ yielded an average improvement of +4.2% in
accuracy for intent task and +1.8% in F1 for slot task using our method over
the state-of-the-art across 8 different languages. Furthermore, we present an
application of our method for crisis informatics using a new human-annotated
tweet dataset of slot filling in English and Haitian Creole, collected during
Haiti earthquake disaster
Code-switched Language Models Using Dual RNNs and Same-Source Pretraining
This work focuses on building language models (LMs) for code-switched text.
We propose two techniques that significantly improve these LMs: 1) A novel
recurrent neural network unit with dual components that focus on each language
in the code-switched text separately 2) Pretraining the LM using synthetic text
from a generative model estimated using the training data. We demonstrate the
effectiveness of our proposed techniques by reporting perplexities on a
Mandarin-English task and derive significant reductions in perplexity.Comment: Accepted at EMNLP 201
Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences
Training code-switched language models is difficult due to lack of data and
complexity in the grammatical structure. Linguistic constraint theories have
been used for decades to generate artificial code-switching sentences to cope
with this issue. However, this require external word alignments or constituency
parsers that create erroneous results on distant languages. We propose a
sequence-to-sequence model using a copy mechanism to generate code-switching
data by leveraging parallel monolingual translations from a limited source of
code-switching data. The model learns how to combine words from parallel
sentences and identifies when to switch one language to the other. Moreover, it
captures code-switching constraints by attending and aligning the words in
inputs, without requiring any external knowledge. Based on experimental
results, the language model trained with the generated sentences achieves
state-of-the-art performance and improves end-to-end automatic speech
recognition.Comment: Accepted in CoNLL 201
CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums
In contrast to many decades of research on oral code-switching, the study of
written multilingual productions has only recently enjoyed a surge of interest.
Many open questions remain regarding the sociolinguistic underpinnings of
written code-switching, and progress has been limited by a lack of suitable
resources. We introduce a novel, large, and diverse dataset of written
code-switched productions, curated from topical threads of multiple bilingual
communities on the Reddit discussion platform, and explore questions that were
mainly addressed in the context of spoken language thus far. We investigate
whether findings in oral code-switching concerning content and style, as well
as speaker proficiency, are carried over into written code-switching in
discussion forums. The released dataset can further facilitate a range of
research and practical activities.Comment: EMNLP2019, 11 page
Code-Switching Detection Using ASR-Generated Language Posteriors
Code-switching (CS) detection refers to the automatic detection of language
switches in code-mixed utterances. This task can be achieved by using a CS
automatic speech recognition (ASR) system that can handle such language
switches. In our previous work, we have investigated the code-switching
detection performance of the Frisian-Dutch CS ASR system by using the time
alignment of the most likely hypothesis and found that this technique suffers
from over-switching due to numerous very short spurious language switches. In
this paper, we propose a novel method for CS detection aiming to remedy this
shortcoming by using the language posteriors which are the sum of the
frame-level posteriors of phones belonging to the same language. The CS
ASR-generated language posteriors contain more complete language-specific
information on frame level compared to the time alignment of the ASR output.
Hence, it is expected to yield more accurate and robust CS detection. The CS
detection experiments demonstrate that the proposed language posterior-based
approach provides higher detection accuracy than the baseline system in terms
of equal error rate. Moreover, a detailed CS detection error analysis reveals
that using language posteriors reduces the false alarms and results in more
robust CS detection.Comment: Accepted for publication at Interspeech 201
Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech
This paper considers the impact of automatic segmentation on the
fully-automatic, semi-supervised training of automatic speech recognition (ASR)
systems for five-lingual code-switched (CS) speech. Four automatic segmentation
techniques were evaluated in terms of the recognition performance of an ASR
system trained on the resulting segments in a semi-supervised manner. The
system's output was compared with the recognition rates achieved by a
semi-supervised system trained on manually assigned segments. Three of the
automatic techniques use a newly proposed convolutional neural network (CNN)
model for framewise classification, and include a novel form of HMM smoothing
of the CNN outputs. Automatic segmentation was applied in combination with
automatic speaker diarization. The best-performing segmentation technique was
also tested without speaker diarization. An evaluation based on 248 unsegmented
soap opera episodes indicated that voice activity detection (VAD) based on a
CNN followed by Gaussian mixture modelhidden Markov model smoothing
(CNN-GMM-HMM) yields the best ASR performance. The semi-supervised system
trained with the resulting segments achieved an overall WER improvement of 1.1%
absolute over the system trained with manually created segments. Furthermore,
we found that system performance improved even further when the automatic
segmentation was used in conjunction with speaker diarization.Comment: SLTU 2020. arXiv admin note: text overlap with arXiv:2003.0313
- …