unknown

Extracting and Attributing Quotes in Text and Assessing them as Opinions

Abstract

News articles often report on the opinions that salient people have about important issues. While it is possible to infer an opinion from a person's actions, it is much more common to demonstrate that a person holds an opinion by reporting on what they have said. These instances of speech are called reported speech, and in this thesis we set out to detect instances of reported speech, attribute them to their speaker, and to identify which instances provide evidence of an opinion. We first focus on extracting reported speech, which involves finding all acts of communication that are reported in an article. Previous work has approached this task with rule-based methods, however there are several factors that confound these approaches. To demonstrate this, we build a corpus of 965 news articles, where we mark all instances of speech. We then show that a supervised token-based approach outperforms all of our rule-based alternatives, even in extracting direct quotes. Next, we examine the problem of finding the speaker of each quote. For this task we annotate the same 965 news articles with links from each quote to its speaker. Using this, and three other corpora, we develop new methods and features for quote attribution, which achieve state-of-the-art accuracy on our corpus and strong results on the others. Having extracted quotes and determined who spoke them, we move on to the opinion mining part of our work. Most of the task definitions in opinion mining do not easily work with opinions in news, so we define a new task, where the aim is to classify whether quotes demonstrate support, neutrality, or opposition to a given position statement. This formulation improved annotator agreement when compared to our earlier annotation schemes. Using this we build an opinion corpus of 700 news documents covering 7 topics. In this thesis we do not attempt this full task, but we do present preliminary results

    Similar works