Cue phrases may be used in a discourse sense to explicitly signal discourse
structure, but also in a sentential sense to convey semantic rather than
structural information. Correctly classifying cue phrases as discourse or
sentential is critical in natural language processing systems that exploit
discourse structure, e.g., for performing tasks such as anaphora resolution and
plan recognition. This paper explores the use of machine learning for
classifying cue phrases as discourse or sentential. Two machine learning
programs (Cgrendel and C4.5) are used to induce classification models from sets
of pre-classified cue phrases and their features in text and speech. Machine
learning is shown to be an effective technique for not only automating the
generation of classification models, but also for improving upon previous
results. When compared to manually derived classification models already in the
literature, the learned models often perform with higher accuracy and contain
new linguistic insights into the data. In addition, the ability to
automatically construct classification models makes it easier to comparatively
analyze the utility of alternative feature representations of the data.
Finally, the ease of retraining makes the learning approach more scalable and
flexible than manual methods.Comment: 42 pages, uses jair.sty, theapa.bst, theapa.st