Experiments with Theme Extraction in Explanatory Texts
- Publication date
- Publisher
Abstract
The notion of theme plays a crucial role in topic-based Information Retrieval. We discuss how topics are related to works in linguistic and discourse theories, and according to which rules they can be derived from texts. Two experiments were devised: the first one to validate those rules with users, and the second to implement them with a collection of structured documents. In the first experiment, participants were asked either to choose between possible themes for an expression, or to find relevant themes in a short text. In the second experiment, a prototypal IRS was built to index the TEI Guidelines, a collection of SGML documents. Results for both experiments show the need for a new, more flexible measure of theme representativity that allow different rankings according to user or query types. Keywords: Information Retrieval, Theme extraction, Indexing, Structured documents. 1 Introduction Natural Language Processing techniques have been used for some time in Information Retrie..