3 research outputs found
Recommended from our members
Investigating the Extractive Summarization of Literary Novels
Abstract
Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre
Automatic summarization of conversational multi-party speech
This proposal addresses the problem of automatically summarizing conversational speech, in particular meeting recordings. The problem is divided into two main steps: utterance selection, the task of identifying a set of utterances representative of the important elements of a meeting, and utterance revision, the task of creating fluent and concise utterances from the ones produced by a speech recognizer. I propose a discourse-based approach to utterance selection that incorporates two processing stages: the first stage is to segment the meeting transcription by topic, a process that provides a high-level structure to the summary to be generated. The second stage analyzes each topical segment and attempts to predict the communicative goal (dialog act) of each utterance in order to determine, given a pragmatic context defined by preceding and succeeding dialog acts, whether the utterance should be included in the summary or not. The second stage is realized using dynamic Bayesian networks, a computational framework that combines here surface features known to be good predictors in the summarization task and inter-sentential discourse dependencies. This enables selected utterances to fit their summaries in coherent discourse situations
Automatic Summarization of Conversational Multi-Party Speech
Document summarization has proven to be a desirable component in many information management systems, complementing core information retrieval and browsing functionalities. The use of document summarization techniques i