1 research outputs found

    Learning the lexicon from raw texts for open-vocabulary Korean word recognition

    No full text
    In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in openvocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Bayesian network model of the language. In simulated word recognition experiments, the proposed language model could find correct words from lattices of character candidates in 94.3 % of cases, increasing the word recognition rates by 20.9%. 1
    corecore