Investigating the Extractive Summarization of Literary Novels

Ceylan, Hakan

Investigating the Extractive Summarization of Literary Novels

Authors: Hakan Ceylan
Publication date: 1 December 2011
Publisher: 'University of North Texas Libraries'

Abstract

Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

UNT Digital Library

info:ark/67531/metadc103298

Last time updated on 21/11/2016