A clustering approach to automatic verb classification incorporating selectional preferences: model, implementation, and user manual

Abstract

This report presents two variations of an innovative, complex approach to semantic verb classes that relies on selectional preferences as verb properties. The underlying linguistic assumption for this verb class model is that verbs which agree on their selectional preferences belong to a common semantic class. The model is implemented as a soft-clustering approach, in order to capture the polysemy of the verbs. The training procedure uses the Expectation-Maximisation (EM) algorithm (Baum, 1972) to iteratively improve the probabilistic parameters of the model, and applies the Minimum Description Length (MDL) principle (Rissanen, 1978) to induce WordNet-based selectional preferences for arguments within subcategorisation frames. One variation of the MDL principle replicates a standard MDL approach by Li and Abe (1998), the other variation presents an improved pruning strategy that outperforms the standard implementation considerably. Our model is potentially useful for lexical induction (e.g., verb senses, subcategorisation and selectional preferences, collocations, and verb alternations), and for NLP applications in sparse data situations. We demonstrate the usefulness of the model by a standard evaluation (pseudo-word disambiguation), and three applications (selectional preference induction, verb sense disambiguation, and semi-supervised sense labelling)

    Similar works