3 research outputs found
Cross-lingual Contextualized Topic Models with Zero-shot Learning
Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in
multiple languages. They all cover the same content, but the linguistic
differences make it impossible to use traditional, bag-of-word-based topic
models. Models have to be either single-language or suffer from a huge, but
extremely sparse vocabulary. Both issues can be addressed by transfer learning.
In this paper, we introduce a zero-shot cross-lingual topic model. Our model
learns topics on one language (here, English), and predicts them for unseen
documents in different languages (here, Italian, French, German, and
Portuguese). We evaluate the quality of the topic predictions for the same
document in different languages. Our results show that the transferred topics
are coherent and stable across languages, which suggests exciting future
research directions.Comment: Updated version. Published as a conference paper at EACL202