2 research outputs found
Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation
Code-switching is about dealing with alternative languages in speech or text.
It is partially speaker-depend and domain-related, so completely explaining the
phenomenon by linguistic rules is challenging. Compared to most monolingual
tasks, insufficient data is an issue for code-switching. To mitigate the issue
without expensive human annotation, we proposed an unsupervised method for
code-switching data augmentation. By utilizing a generative adversarial
network, we can generate intra-sentential code-switching sentences from
monolingual sentences. We applied proposed method on two corpora, and the
result shows that the generated code-switching sentences improve the
performance of code-switching language models.Comment: Accepted by Interspeech 201
Code-switched Language Models Using Dual RNNs and Same-Source Pretraining
This work focuses on building language models (LMs) for code-switched text.
We propose two techniques that significantly improve these LMs: 1) A novel
recurrent neural network unit with dual components that focus on each language
in the code-switched text separately 2) Pretraining the LM using synthetic text
from a generative model estimated using the training data. We demonstrate the
effectiveness of our proposed techniques by reporting perplexities on a
Mandarin-English task and derive significant reductions in perplexity.Comment: Accepted at EMNLP 201