Article thumbnail
Location of Repository

HM-BiTAM: Bilingual topic exploration, word alignment, and translation

By Bing Zhao and Eric P. Xing

Abstract

We present a novel paradigm for statistical machine translation (SMT), based on a joint modeling of word alignment and the topical aspects underlying bilingual document-pairs, via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of mapping words between languages, likelihood-based training of topic-dependent translational lexicons, as well as in the inference of topic representations in each language. The learned HM-BiTAM can not only display topic patterns like methods such as LDA [1], but now for bilingual corpora; it also offers a principled way of inferring optimal translation using document context. Our method integrates the conventional model of HMM — a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model [10]; we report an extensive empirical analysis (in many ways complementary to the description-oriented [10]) of our method in three aspects: bilingual topic representation, word alignment, and translation.

Year: 2008
OAI identifier: oai:CiteSeerX.psu:10.1.1.304.7834
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://books.nips.cc/papers/fi... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.