Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units. Keywords: Chinese lexical analysis; Lexical chunking; Wor
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.