250 research outputs found

    MultiGBS: A multi-layer graph approach to biomedical summarization

    Full text link
    Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values

    Extractive Conversation Summarization Driven by Textual Entailment Prediction

    Get PDF
    Summarizing conversations like meetings, email threads or discussion forums poses relevant challenges on how to model the dialogue structure. Existing approaches mainly focus on premise-claim entailment relationships while neglecting contrasting or uncertain assertions. Furthermore, existing techniques are abstractive, thus requiring a training set consisting of humanly generated summaries. With the twofold aim of enriching the dialogue representation and addressing conversation summarization in the absence of training data, we present an extractive conversation summarization pipeline. We explore the use of contradictions and neutral premise-claim relations, both in the same document or in different documents. The results achieved on four datasets covering different domains show that applying unsupervised methods on top of a refined premise-claim selection achieves competitive performance in most domains

    SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

    Get PDF
    Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner

    Self-Supervised and Controlled Multi-Document Opinion Summarization

    Full text link
    We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control. We propose a self-supervised setup that considers an individual document as a target summary for a set of similar documents. This setting makes training simpler than previous approaches by relying only on standard log-likelihood loss. We address the problem of hallucinations through the use of control codes, to steer the generation towards more coherent and relevant summaries.Finally, we extend the Transformer architecture to allow for multiple reviews as input. Our benchmarks on two datasets against graph-based and recent neural abstractive unsupervised models show that our proposed method generates summaries with a superior quality and relevance.This is confirmed in our human evaluation which focuses explicitly on the faithfulness of generated summaries We also provide an ablation study, which shows the importance of the control setup in controlling hallucinations and achieve high sentiment and topic alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
    • …
    corecore