Research into the automatic acquisition of lexical information from corpora
is starting to produce large-scale computational lexicons containing data on
the relative frequencies of subcategorisation alternatives for individual
verbal predicates. However, the empirical question of whether this type of
frequency information can in practice improve the accuracy of a statistical
parser has not yet been answered. In this paper we describe an experiment with
a wide-coverage statistical grammar and parser for English and
subcategorisation frequencies acquired from ten million words of text which
shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st