4 research outputs found
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
Semantic Mapping Using Automatic Word Alignment and Semantic Role Labeling
To facilitate the application of semantics in statistical machine translation, we propose a broad-coverage predicate-argument structure mapping technique using automated resources. Our approach utilizes automatic syntactic and semantic parsers to generate Chinese-English predicate-argument structures. The system produced a many-to-many argument mapping for all PropBank argument types by computing argument similarity based on automatic word alignment, achieving 80.5 % F-score on numbered argument mapping and 64.6 % F-score on all arguments. By measuring predicate-argument structure similarity based on the argument mapping, and formulating the predicate-argument structure mapping problem as a linear-assignment problem, the system achieved 84.9 % F-score using automatic SRL, only 3.7 % F-score lower than using gold standard SRL. The mapping output covered 49.6 % of the annotated Chinese predicates (which contains predicateadjectives that often have no parallel annotations in English) and 80.7 % of annotated English predicates, suggesting its potential as a valuable resource for improving word alignment and reranking MT output.
Recommended from our members
Hybrid System Combination for Machine Translation: An Integration of Phrase-level and Sentences-level Combination Approaches
Given the wide range of successful statistical MT approaches that have emerged recently, it would be beneficial to take advantage of their individual strengths and avoid their individual weaknesses. Multi-Engine Machine Translation (MEMT) attempts to do so by either fusing the output of multiple translation engines or selecting the best translation among them, aiming to improve the overall translation quality. In this thesis, we propose to use the phrase or the sentence as our combination unit instead of the word; three new phrase-level models and one sentence-level model with novel features are proposed. This contrasts with the most popular system combination technique to date which relies on word-level confusion network decoding.
Among the three new phrase-level models, the first one utilizes source sentences and target translation hypotheses to learn hierarchical phrases -- phrases that contain subphrases (Chiang 2007). It then re-decodes the source sentences using the hierarchical phrases to combine the results of multiple MT systems. The other two models we propose view combination as a paraphrasing process and use paraphrasing rules. The paraphrasing rules are composed of either string-to-string paraphrases or hierarchical paraphrases, learned from monolingual word alignments between a selected best translation hypothesis and other hypotheses. Our experimental results show that all of the three phrase-level models give superior performance in BLEU compared with the best single translation engine. The two paraphrasing models outperform the re-decoding model and the confusion network baseline model.
The sentence-level model exploits more complex syntactic and semantic information than the phrase-level models. It uses consensus, argument alignment, a supertag-based structural language model and a syntactic error detector. We use our sentence-level model in two ways: the first selects a translated sentence from multiple MT systems as the best translation to serve as a backbone for paraphrasing process; the second makes the final decision among all fused translations generated by the phrase-level models and all translated sentences of multiple MT systems. We proposed two novel hybrid combination structures for the integration of phrase-level and sentence-level combination frameworks in order to utilize the advantages of both frameworks and provide a more diverse set of plausible fused translations to consider