Although Singing Voice Synthesis (SVS) has made great strides with
Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains
relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system
for English and Chinese Mandarin. Current systems require separate models per
language and cannot accurately represent both Chinese and English, hindering
code-switch SVS. To address this gap, we design a shared representation between
Chinese and English singing voices, achieved by using the CMU dictionary with
mapping rules. We fuse monolingual singing datasets with open-source singing
voice conversion techniques to generate bilingual singing voices while also
exploring the potential use of bilingual speech data. Experiments affirm that
our language-independent representation and incorporation of related datasets
enable a single model with enhanced performance in English and code-switch SVS
while maintaining Chinese song performance. Audio samples are available at
https://bisinger-svs.github.io.Comment: Accepted by ASRU202