1,518 research outputs found
Hybrid Approach to English-Hindi Name Entity Transliteration
Machine translation (MT) research in Indian languages is still in its
infancy. Not much work has been done in proper transliteration of name entities
in this domain. In this paper we address this issue. We have used English-Hindi
language pair for our experiments and have used a hybrid approach. At first we
have processed English words using a rule based approach which extracts
individual phonemes from the words and then we have applied statistical
approach which converts the English into its equivalent Hindi phoneme and in
turn the corresponding Hindi word. Through this approach we have attained
83.40% accuracy.Comment: Proceedings of IEEE Students' Conference on Electrical, Electronics
and Computer Sciences 201
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
A Comparison of Different Machine Transliteration Models
Machine transliteration is a method for automatically converting words in one
language into phonetically equivalent ones in another language. Machine
transliteration plays an important role in natural language applications such
as information retrieval and machine translation, especially for handling
proper nouns and technical terms. Four machine transliteration models --
grapheme-based transliteration model, phoneme-based transliteration model,
hybrid transliteration model, and correspondence-based transliteration model --
have been proposed by several researchers. To date, however, there has been
little research on a framework in which multiple transliteration models can
operate simultaneously. Furthermore, there has been no comparison of the four
models within the same framework and using the same data. We addressed these
problems by 1) modeling the four models within the same framework, 2) comparing
them under the same conditions, and 3) developing a way to improve machine
transliteration through this comparison. Our comparison showed that the hybrid
and correspondence-based models were the most effective and that the four
models can be used in a complementary manner to improve machine transliteration
performance
MIRACLE Retrieval Experiments with East Asian Languages
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations
Are You Finding the Right Person? A Name Translation System Towards Web 2.0
In a multilingual world, information available in global information systems is increasing rapidly. Searching for proper names in foreign language becomes an important task in multilingual search and knowledge discovery. However, these names are the most difficult to handle because they are often unknown words that cannot be found in a translation dictionary and even human experts cannot handle the variation generated during translation. Furthermore, existing research on name translation have focused on translation algorithms. However, user experience during name translation and name search are often ignored. With the Web technology moving towards Web 2.0, creating a platform that allow easier distributed collaboration and information sharing, we seek methods to incorporate Web 2.0 technologies into a name translation system. In this research, we review challenges in name translation and propose an interactive name translation and search system: NameTran. This system takes English names and translates them into Chinese using a combined hybrid Hidden Markov Model-based (HMM-based) transliteration approach and a web mining approach. Evaluation results showed that web mining consistently boosted the performance of a pure HMM approach. Our system achieved top-1 accuracy of 0.64 and top-8 accuracy of 0.96. To cope with changing popularity and variation in name translations, we demonstrated the feasibility of allowing users to rank translations and the new ranking serves as feedback to the original trained HMM model. We believe that such user input will significantly improve system usability
How much hybridisation does machine translation need?
This is the peer reviewed version of the following article: [Costa-jussà , M. R. (2015), How much hybridization does machine translation Need?. J Assn Inf Sci Tec, 66: 2160–2165. doi:10.1002/asi.23517], which has been published in final form at [10.1002/asi.23517]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Rule-based and corpus-based machine translation (MT)have coexisted for more than 20 years. Recently, bound-aries between the two paradigms have narrowed andhybrid approaches are gaining interest from bothacademia and businesses. However, since hybridapproaches involve the multidisciplinary interaction oflinguists, computer scientists, engineers, and informa-tion specialists, understandably a number of issuesexist.While statistical methods currently dominate researchwork in MT, most commercial MT systems are techni-cally hybrid systems. The research community shouldinvestigate the bene¿ts and questions surrounding thehybridization of MT systems more actively. This paperdiscusses various issues related to hybrid MT includingits origins, architectures, achievements, and frustra-tions experienced in the community. It can be said thatboth rule-based and corpus- based MT systems havebene¿ted from hybridization when effectively integrated.In fact, many of the current rule/corpus-based MTapproaches are already hybridized since they do includestatistics/rules at some point.Peer ReviewedPostprint (author's final draft
- …