Grammar Induction from Text Using Small Syntactic Prototypes

Abstract

We present an efficient technique to incorporate a small number of cross-linguistic parameter settings defining default word orders to otherwise unsupervised grammar induction. A syntactic prototype, represented by the integrated model between Categorial Grammar and dependency structure, generated from the language parameters, is used to prune the search space. We also propose heuristics which prefer less complex syntactic categories to more complex ones in parse decoding. The system reduces errors generated by the state-of-the-art baselines for WSJ10 (1 % error reduction of F1 score for the model trained on Sections 2–22 and tested on Section 23), Chinese10 (26 % error reduction of F1), German10 (9 % error reduction of F1), and Japanese10 (8% error reduction of F1), and is not significantly different from the baseline for Czech10.

    Similar works