Search CORE

1 research outputs found

An English-translated parallel corpus for the CJK Wikipedia collections

Author: Geva Shlomo
Tang Eric
Trotman Andrew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia

Crossref

Queensland University of Technology ePrints Archive