Automatic Semantic Classification for Chinese Unknown Compound Nouns


The paper describes a similarity-based model to present the morphological rules for Chinese compound nouns. This representation model serves functions of 1) as the morphological rules of the compounds, 2) as a mean to evaluate the properness of a compound construction, and 3) as a mean to disambiguate the semantic ambiguity of the morphological head of a compound noun. An automatic semantic classification system for Chinese unknown compounds is thus implemented based on the model. Experiments and error analyses are also presented. 1. Introduction The occurrences of unknown words cause difficulties in natural language processing. The word set of a natural language is open-ended. There is no way of collecting every words of a language, since new words will be created for expressing new concepts, new inventions. Therefore how to identify new words in a text will be the most challenging task for natural language processing. It is especially true for Chinese. Each Chinese morpheme (usuall..

