3 research outputs found

    Subtopic-oriented biomedical summarization using pretrained language models

    Get PDF
    The ever-growing number of publications in the biomedical field is causing difficulties in finding insightful knowledge. In this work, we propose a subtopic-oriented summarization framework that aims to provide an overview on the state-of-the-art of a given subject. The method we propose clusters the papers retrieved from a query and then, for each cluster, extracts the subtopics and summarizes the abstracts. We conducted various experiments to select the most appropriate clustering approach and concluded that the best choices are MiniLM for text embedding, UMAP for dimensionality reduction and OPTICS as clustering algorithm. For summarization, we fine-tuned both general-domain and biomedical pretrained language models for the task of extractive summarization and selected Longformer as the most suited model. Experimental results on multi-document summarization datasets show that the proposed framework improves the overall recall of the generated summary with a small decrease in precision, which corresponds to slightly longer summaries but closer to the ground truth

    Combining state-of-the-art models for multi-document summarization using maximal marginal relevance

    Get PDF
    In Natural Language Processing, multi-document summarization (MDS) poses many challenges to researchers. While advancements in deep learning approaches have led to the development of several advanced language models capable of summarization, the variety of approaches specific to the problem of multi-document summarization remains relatively limited. Current state-of-the-art models produce impressive results on multi-document datasets, but the question of whether improvements can be made via the combination of these state-of-the-art models remains. This question is particularly relevant in few-shot and zero-shot applications, in which models have little familiarity or no familiarity with the expected output, respectively. To explore one potential method, we implement a query-relevance-focused approach which combines the pretrained models' outputs using maximal marginal relevance (MMR). Our MMR-based approach shows improvement over some aspects of the current state-of-the-art results while preserving overall state-of-the-art performance, with larger improvements occurring in fewer-shot contexts.University of Lethbridg
    corecore