1 research outputs found
Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories
Although current CCG supertaggers achieve high accuracy on the standard WSJ
test set, few systems make use of the categories' internal structure that will
drive the syntactic derivation during parsing. The tagset is traditionally
truncated, discarding the many rare and complex category types in the long
tail. However, supertags are themselves trees. Rather than give up on rare
tags, we investigate constructive models that account for their internal
structure, including novel methods for tree-structured prediction. Our best
tagger is capable of recovering a sizeable fraction of the long-tail supertags
and even generates CCG categories that have never been seen in training, while
approximating the prior state of the art in overall tag accuracy with fewer
parameters. We further investigate how well different approaches generalize to
out-of-domain evaluation sets.Comment: Accepted to appear in TACL; Authors' final version, pre-MIT Press
publicatio