Search CORE

1 research outputs found

Learning the lexicon from raw texts for open-vocabulary Korean word recognition

Author: Jin Hyung Kim
Sungho Ryu
Publication venue
Publication date: 15/04/2010
Field of study

In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in openvocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Bayesian network model of the language. In simulated word recognition experiments, the proposed language model could find correct words from lattices of character candidates in 94.3 % of cases, increasing the word recognition rates by 20.9%. 1

CiteSeerX