Word ordering is a constrained language generation task taking unordered
words as input. Existing work uses linear models and neural networks for the
task, yet pre-trained language models have not been studied in word ordering,
let alone why they help. We use BART as an instance and show its effectiveness
in the task. To explain why BART helps word ordering, we extend analysis with
probing and empirically identify that syntactic dependency knowledge in BART is
a reliable explanation. We also report performance gains with BART in the
related partial tree linearization task, which readily extends our analysis.Comment: COLING 202