Location of Repository

33 Translating Transliterations

By Jörg Tiedemann and Peter Nabende

Abstract

Translating new entity names is important for improving performance in Natural Language Processing (NLP) applications such as Machine Translation (MT) and Cross Language Information Retrieval (CLIR). Usually, transliteration is used to obtain phonetic equivalents in a target language for a given source language word. However, transliteration across different writing systems often results in different representations for a given source language entity name. In this paper, we address the problem of automatically translating transliterated entity names that originally come from a different writing system. These entity names are often spelled differently in languages using the same writing system. We train and evaluate various models based on finite state technology and Statistical Machine Translation (SMT) for a character-based translation of the transliterated entity names. In particular, we evaluate the models for translation of Russian person names between Dutch and English, and between English and French. From our experiments, the SMT models perform best with consistent improvements compared to a baseline method of copying strings

Topics: Categories and Subject Descriptors, I.2.7 [Artificial Intelligence, Natural Language Processing- Machine Translation Key Words and Phrases, transliteration, machine translation, weighted finite state transducers, phrase-based statistical machine translation, character-based machine translation
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.372.8099
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://ijcir.org/specialissue2... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.