Transactions of Association for Computational Linguistics
Abstract
Inferring the information structure of scientific
documents is useful for many NLP applications.
Existing approaches to this task require
substantial human effort. We propose
a framework for constraint learning that reduces
human involvement considerably. Our
model uses topic models to identify latent topics
and their key linguistic features in input
documents, induces constraints from this information
and maps sentences to their dominant
information structure categories through
a constrained unsupervised model. When
the induced constraints are combined with a
fully unsupervised model, the resulting model
challenges existing lightly supervised featurebased
models as well as unsupervised models
that use manually constructed declarative
knowledge. Our results demonstrate that useful
declarative knowledge can be learned from
data with very limited human involvement.This is the final published version. It first appeared at https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/472