Location of Repository

Key Technologies for Multilingual Information Processing on WWW

By Akira Maeda and Shunsuke Uemura

Abstract

This paper discusses key technologies required to realize a document database which is the multilingual collection of documents typically seen on WWW, and to realize a system which supports easy access to such multilingual information. Specifically, we focus on such techniques as 1) crosslanguage information retrieval (CLIR), which supports conversion of cultural factors such as units, era names and color names, 2) an algorithm for automatic identification of language and coding system of documents. The goal of our research is to develop a system which supports end-user access to multilingual information by integrating these techniques. 1 Introduction With the growth of the Internet and WWW in recent years, documents written in various languages are being provided. Although 80% of current Web pages are written in English, it is estimated that over a half of Web documents will be nonEnglish in 2003 1 . Therefore, WWW can be regarded as a huge document database which contains..

Year: 1999
OAI identifier: oai:CiteSeerX.psu:10.1.1.32.9390
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://db-www.aist-nara.ac.jp/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.