Search CORE

2 research outputs found

Web-based query translation for English-Chinese CLIR

Author: Geva Shlomo
Lu Chengye
Xu Yue
Publication venue: Association of Computational Linguistics and Chinese Language Processing
Publication date: 01/01/2008
Field of study

Dictionary-based translation is a traditional approach in use by cross-language information retrieval systems. However, significant performance degradation is often observed when queries contain words that do not appear in the dictionary. This is called the Out of Vocabulary (OOV) problem. In recent years, Web mining has been shown to be one of the effective approaches for solving this problem. However, the questions of how to extract Multiword Lexical Units (MLUs) from the Web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in Web mining based automated translation approaches. Most statistical approaches to MLU extraction rely on statistical information extracted from huge corpora. In the case of using Web mining techniques for automated translations, these approaches do not perform well because the size of the corpus is usually too small and statistical approaches that rely on a large sample can become unreliable. In this paper, we present a new Chinese term measurement and a new Chinese MLU extraction process that work well on small corpora. We also present our approach to the selection of MLUs in a more accurate manner. Our experiments show marked improvement in translation accuracy over other commonly used approaches

Queensland University of Technology ePrints Archive

Web-Based Query Translation for English-Chinese CLIR

Author: Geva Shlomo
Lu Chengye
Xu Yue
Publication venue: Association for Computational Linguistics and Chinese Language Processing
Publication date: 01/01/2008
Field of study

Dictionary-based translation is a traditional approach in use by cross-language\ud information retrieval systems. However, significant performance degradation is\ud often observed when queries contain words that do not appear in the dictionary.\ud This is called the Out of Vocabulary (OOV) problem. In recent years, Web mining\ud has been shown to be one of the effective approaches for solving this problem.\ud However, the questions of how to extract Multiword Lexical Units (MLUs) from\ud the Web content and how to select the correct translations from the extracted\ud candidate MLUs are still two difficult problems in Web mining based automated\ud translation approaches.\ud Most statistical approaches to MLU extraction rely on statistical information\ud extracted from huge corpora. In the case of using Web mining techniques for\ud automated translations, these approaches do not perform well because the size of\ud the corpus is usually too small and statistical approaches that rely on a large sample\ud can become unreliable. In this paper, we present a new Chinese term measurement\ud and a new Chinese MLU extraction process that work well on small corpora. We\ud also present our approach to the selection of MLUs in a more accurate manner. Our\ud experiments show marked improvement in translation accuracy over other\ud commonly used approaches

Queensland University of Technology ePrints Archive