25 research outputs found

    A Proposition-based Abstractive Summarizer

    Get PDF
    Abstractive summarisation is not yet common amongst today's deployed and research systems. Most existing systems either extract sentences or compress individual sentences. In this paper, we present a summariser that works by a different paradigm. It is a further development of an existing summariser that has an incremental, proposition-based content selection process but lacks a natural language (NL) generator for the final output. Using an NL generator, we can now produce the summary text to directly reflect the selected propositions. Our evaluation compares textual quality of our system to the earlier preliminary output method, and also uses ROUGE to compare to various summarisers that use the traditional method of sentence extraction, followed by compression. Our results suggest that cutting out the middleman of sentence extraction can lead to better abstractive summaries

    Content summarisation of conversation in the context of virtual meetings: An enhanced TextRank approach

    Get PDF

    Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

    Full text link
    Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.Comment: Published at EMNLP 201

    Towards Personalized and Human-in-the-Loop Document Summarization

    Full text link
    The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.Comment: PhD thesi

    Computational Argumentation Approaches to Improve Sensemaking and Evidence-based Reasoning in Online Deliberation Systems

    Get PDF
    Deliberation is the process through which communities identify potential solutions for a problem and select the solution that most effectively meets their diverse requirements through dialogic communication. Online deliberation is implemented nowadays with means of social media and online discussion platforms; however, these media present significant challenges and issues that can be traced to inadequate support for Sensemaking processes and poor endorsement of the quality characteristics of deliberation. This thesis investigates integrating computational argumentation methods in online deliberation platforms as an effective way to improve participants' perception of the quality of the deliberation process, their way of making sense of the overall process and producing healthier social dynamics. For that, two computational artefacts are proposed: (i) a Synoptical summariser of long discussions and (ii) a Scientific Argument Recommender System (SciArgRecSys). The two artefacts are designed and developed with state-of-the-art methods (with the use of Large Language Models - LLMs) and evaluated intrinsically and extrinsically when deployed in a real live platform (BCause). Through extensive evaluation, the positive effect of both artefacts is illustrated in human Sensemaking and essential quality characteristics of deliberation such as reciprocal Engagement, Mutual Understanding, and Social dynamics. In addition, it has been demonstrated that these interventions effectively reduce polarisation, the formation of sub-communities while significantly enhancing the quality of the discussion by making it more coherent and diverse

    Cross-Language Text Summarization using Sentence and Multi-Sentence Compression

    Get PDF
    long paperInternational audienceCross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual sum-marization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics

    V-ROOM: a virtual meeting system with intelligent structured summarisation

    Get PDF
    With the growth of virtual organisations and multinational companies, virtual collaboration tasks are becoming more important for employees. This paper describes the development of a virtual meeting system, called V-ROOM. An exploration of facilities required in such a system has been conducted. The findings highlighted that intelligent systems are needed, especially since information that individuals have to know and process, is vast. The survey results showed that meeting summarisation is one of the most important new features that should be added to virtual meeting systems for enterprises. This paper highlights the innovative methods employed in V-ROOM to produce relevant meeting summaries. V- ROOM's approach is compared to other methods from the literature and it is shown how the use of meta-data provided by parts of the V-ROOM system can improve the quality of summaries produced

    Automatic text summarisation using linguistic knowledge-based semantics

    Get PDF
    Text summarisation is reducing a text document to a short substitute summary. Since the commencement of the field, almost all summarisation research works implemented to this date involve identification and extraction of the most important document/cluster segments, called extraction. This typically involves scoring each document sentence according to a composite scoring function consisting of surface level and semantic features. Enabling machines to analyse text features and understand their meaning potentially requires both text semantic analysis and equipping computers with an external semantic knowledge. This thesis addresses extractive text summarisation by proposing a number of semantic and knowledge-based approaches. The work combines the high-quality semantic information in WordNet, the crowdsourced encyclopaedic knowledge in Wikipedia, and the manually crafted categorial variation in CatVar, to improve the summary quality. Such improvements are accomplished through sentence level morphological analysis and the incorporation of Wikipedia-based named-entity semantic relatedness while using heuristic algorithms. The study also investigates how sentence-level semantic analysis based on semantic role labelling (SRL), leveraged with a background world knowledge, influences sentence textual similarity and text summarisation. The proposed sentence similarity and summarisation methods were evaluated on standard publicly available datasets such as the Microsoft Research Paraphrase Corpus (MSRPC), TREC-9 Question Variants, and the Document Understanding Conference 2002, 2005, 2006 (DUC 2002, DUC 2005, DUC 2006) Corpora. The project also uses Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for the quantitative assessment of the proposed summarisers’ performances. Results of our systems showed their effectiveness as compared to related state-of-the-art summarisation methods and baselines. Of the proposed summarisers, the SRL Wikipedia-based system demonstrated the best performance
    corecore