Artificial Intelligence

Abstract

Discourse connectives can show sense ambiguities, in that they can signal more than one possible rhetorical relation. The aim of this study is discover how to disambiguate such discourse connectives using a statistical model. Six discourse connectives (after, as soon as, before, once, since and while) which show am-biguities in the sdrt (Segmented Discourse Representation Theory (Asher & Lascarides, 2003)) relation that they signal are considered. Maximum entropy based models using different combinations of linguistic features derived from the connective’s context are trained and tested on a corpus of examples containing these connectives, which has been annotated with the correct rhetorical relation. The best performing model achieves an average of 70.4 % accuracy across all the connectives, as compared to a most common sense baseline of 57.2%. There is a wide variation in performance between the different connectives, with the models for since and while at 30 percentage points above the baseline, and the models for after and as soon as failing to beat the baseline by a statistically signficant margin. The most informative features in the model were found to be those de-rived from the main verbs in the text spans connected by the rhetorical relation, and the words and parts of speech collocated with the connective. i Acknowledgements I would like to thank my supervisor, Alex Lascarides, for introducing me to the study of discourse and for all her help, encouragement and timely feedback throughout this project. I would also like to thank Mirella Lapata for extracting the examples which formed the corpus for this study, and the annotators; Alex

    Similar works

    Full text

    thumbnail-image

    Available Versions