3 research outputs found

    Improving single document summarization in a multi-document environment

    Get PDF
    Most automatic document summarization tools produce summaries from single or multiple document environments. Recent works have shown that there are possibilities to combine both systems: when summarising a single document, its related documents can be found. These documents might have similar knowledge and contain beneficial information in regard to the topic of the single document. Therefore, the summary produced will have sentences extracted from the local (single) document and make use of the additional knowledge from its surrounding (multi-) documents. This thesis will discuss the methodology and experiments to build a generic and extractive summary for a single document that includes information from its neighbourhood documents. We also examine the evaluation and configuration of such systems. There are three contributions of our work. First, we explore the robustness of the Affinity Graph algorithm to generate a summary for a local document. This experiment focused on two main tasks: using different means to identify the related documents, and to summarize the local document by including the information from the related documents. We showed that our findings supported the previous work on document summarization using the Affinity Graph. However, contrary to past suggestions that one configuration of settings was best, we found no particular settings gave better improvements over another. Second, we applied the Affinity Graph algorithm in a social media environment. Recent work in social media suggests that information from blogs and tweets contain parts of the web document that are considered interesting to the user. We assumed that this information could be used to select important sentences from the web document, and hypothesized that the information would improve the summary of a single document. Third, we compare the summaries generated using the Affinity Graph algorithm in two types of evaluation. The first evaluation is by using ROUGE, a commonly used evaluation tools that measure the number of overlapping words between automated summaries and human-generated summaries. In the second evaluation, we studied the judgement of human users using a crowdsourcing platform. Here, we asked people to choose their judgement and explained their reasons to prefer one summary to another. The results from the ROUGE evaluation did not give significant results due to the small tweet-document dataset used in our experiments. However, our findings on the human judgement evaluation showed that the users are more likely to choose the summaries generated using the expanded tweets compared to summaries generated from the local documents only. We conclude the thesis with a study of the user comments, and discussion on the use of Affinity Graph to improve single document summarization. We also include the discussion of the lessons learnt from the user preference evaluation using crowdsourcing platform
    corecore