A New Algorithm for Document Aboutness

PONZA, MARCO

A New Algorithm for Document Aboutness

Authors: MARCO PONZA
Publication date: 22 July 2015
Publisher: 'Pisa University Press'

Abstract

The thesis investigates the document aboutness task and proposes the design, implementation and test of a system that identifies the main focus of a text by detecting entities which are salient for its discourses and are drawn from Wikipedia. In order to design this system we deploy several Natural Language Processing tools, such as entity annotator, text summarizer and dependency parser. By using these tools we derive a large set of features upon which we develop a (binary) classifier that distinguishes salient versus non-salient entities. The efficiency and effectiveness of the developed system is checked via a large experimental test over the well-known annotated New York Times dataset

Similar works

Full text

Available Versions

Electronic Thesis and Dissertation Archive - Università di Pisa

oai:etd.adm.unipi.it:etd-07032...

Last time updated on 15/03/2017