We present an automatic animacy classifier for Dutch that can determine the animacy status of nouns - how alive the noun's referent is (human, inanimate, etc.). Animacy is a semantic property that has been shown to play a role in human sentence processing, felicity and grammaticality ("the spoon *who is on the table fell."). We expect knowledge about animacy to be helpful for parsing, translation and other NLP tasks, although animacy is not marked explicitly in Dutch. Only a few animacy classifiers and animacy-annotated corpora exist internationally. For Dutch, animacy information is only available in the Cornetto lexical-semantic database. We augment this lexical information with context information from the Dutch Lassy Large treebank, to create training data for an animacy classifier that uses context features. An existing Swedish animacy classifier (Øvrelid, 2009) uses the k-nearest neighbour algorithm with morphosyntactic distributional features, e.g. how frequently the noun occurs as a sentence subject in a corpus, to decide on the (predominant) animacy class. For Dutch we use the same algorithm, but with distributional lexical features, e.g. how frequently the noun occurs as a subject of the verb `to think' in a corpus. The size of the Lassy Large corpus makes this possible, and the higher level of detail these word association features provide, increases the classifier accuracy and provides us with accurate Dutch-language animacy classification. These results allow (semi-)automatic corpus animacy annotation for creating animacy training resources, which can help other Dutch NLP tools to incorporate the animacy property of nouns.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.