thesis

A Ranking Approach to Summarising Twitter Home Timelines

Abstract

The rise of social media services has changed the ways in which users can communicate and consume content online. Whilst online social networks allow for fast and convenient delivery of knowledge, users are prone to information overload when too much information is presented for them to read and process. Automatic text summarisation is a tool to help mitigate information overload. In automatic text summarisation, short summaries are generated algorithmically from extended text, such as news articles or scientific papers. This thesis addresses the challenges in applying text summarisation to the Twitter social network. It also goes beyond text, exploiting additional information that is unique to social networks to create summaries which are personal to an intended reader. Unlike previous work in tweet summarisation, the experiments here address the home timelines of readers, which contain the incoming posts from authors to whom they have explicitly subscribed. A novel contribution is made in this work the form of a large gold standard (19,35019,350 tweets), the majority of which will be shared with the research community. The gold standard is a collection of timelines that have been subjectively annotated by the readers to whom they belong, allowing fair evaluation of summaries which are not limited to tweets of general interest, but which are specific to the reader. Where the home timeline is used by professional users for social media analysis, automatic text summarisation can be applied to give results which beat all baselines. In the general case, where no limitation is placed on the types of readers, personalisation features which exploit the relationship between author and reader and the reader's own previous posts, were shown to outperform both automatic text summarisation and all baselines

    Similar works