We explore a novel approach to automatically predict noun number in Chinese by using a word-aligned Chinese-English parallel corpus. We first map number information from English onto Chinese to create a dataset labeled with a POS tagset enhanced with number information, and then train a model to automatically predict noun number using a combination of lexical and syntactic features. We evaluate the quality of the automatically mapped data and show the mapping is largely adequate despite a small percentage of errors. Trained on a relatively small data set, our model achieves a 4 % improvement in absolute accuracy over a majority baseline that considers all nouns to be singular.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.