Punctuation is a strong indicator of syntactic structure, and parsers trained
on text with punctuation often rely heavily on this signal. Punctuation is a
diversion, however, since human language processing does not rely on
punctuation to the same extent, and in informal texts, we therefore often leave
out punctuation. We also use punctuation ungrammatically for emphatic or
creative purposes, or simply by mistake. We show that (a) dependency parsers
are sensitive to both absence of punctuation and to alternative uses; (b)
neural parsers tend to be more sensitive than vintage parsers; (c) training
neural parsers without punctuation outperforms all out-of-the-box parsers
across all scenarios where punctuation departs from standard punctuation. Our
main experiments are on synthetically corrupted data to study the effect of
punctuation in isolation and avoid potential confounds, but we also show
effects on out-of-domain data.Comment: Analyzing and interpreting neural networks for NLP, EMNLP 2018
worksho