We study the impact of big models (in terms of the degree of lexicalization)
and big data (in terms of the training corpus size) on dependency grammar
induction. We experimented with L-DMV, a lexicalized version of Dependency
Model with Valence and L-NDMV, our lexicalized extension of the Neural
Dependency Model with Valence. We find that L-DMV only benefits from very small
degrees of lexicalization and moderate sizes of training corpora. L-NDMV can
benefit from big training data and lexicalization of greater degrees,
especially when enhanced with good model initialization, and it achieves a
result that is competitive with the current state-of-the-art.Comment: EMNLP 201