This paper discusses key technologies required to realize a document database which is the multilingual collection of documents typically seen on WWW, and to realize a system which supports easy access to such multilingual information. Specifically, we focus on such techniques as 1) crosslanguage information retrieval (CLIR), which supports conversion of cultural factors such as units, era names and color names, 2) an algorithm for automatic identification of language and coding system of documents. The goal of our research is to develop a system which supports end-user access to multilingual information by integrating these techniques. 1 Introduction With the growth of the Internet and WWW in recent years, documents written in various languages are being provided. Although 80% of current Web pages are written in English, it is estimated that over a half of Web documents will be nonEnglish in 2003 1 . Therefore, WWW can be regarded as a huge document database which contains..
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.