Language models for agglutinative languages have always been hindered in past
due to myriad of agglutinations possible to any given word through various
affixes. We propose a method to diminish the problem of out-of-vocabulary words
by introducing an embedding derived from syllables and morphemes which
leverages the agglutinative property. Our model outperforms character-level
embedding in perplexity by 16.87 with 9.50M parameters. Proposed method
achieves state of the art performance over existing input prediction methods in
terms of Key Stroke Saving and has been commercialized.Comment: Accepted at EMNLP 2017 workshop on Subword and Character level models
in NLP (SCLeM