250 research outputs found
MultiGBS: A multi-layer graph approach to biomedical summarization
Automatic text summarization methods generate a shorter version of the input
text to assist the reader in gaining a quick yet informative gist. Existing
text summarization methods generally focus on a single aspect of text when
selecting sentences, causing the potential loss of essential information. In
this study, we propose a domain-specific method that models a document as a
multi-layer graph to enable multiple features of the text to be processed at
the same time. The features we used in this paper are word similarity, semantic
similarity, and co-reference similarity, which are modelled as three different
layers. The unsupervised method selects sentences from the multi-layer graph
based on the MultiRank algorithm and the number of concepts. The proposed
MultiGBS algorithm employs UMLS and extracts the concepts and relationships
using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation
by ROUGE and BERTScore shows increased F-measure values
Extractive Conversation Summarization Driven by Textual Entailment Prediction
Summarizing conversations like meetings, email threads or discussion forums poses relevant challenges on how to model the dialogue structure. Existing approaches mainly focus on premise-claim entailment relationships while neglecting contrasting or uncertain assertions. Furthermore, existing techniques are abstractive, thus requiring a training set consisting of humanly generated summaries. With the twofold aim of enriching the dialogue representation and addressing conversation summarization in the absence of training data, we present an extractive conversation summarization pipeline. We explore the use of contradictions and neutral premise-claim relations, both in the same document or in different documents. The results achieved on four datasets covering different domains show that applying unsupervised methods on top of a refined premise-claim selection achieves competitive performance in most domains
SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation
Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
- …