223 research outputs found

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

    Graphical Representation of Text Semantics

    Get PDF
    A text is a set of words conveying a particular semantic based on their order, representation and structure. Those elements can be associated through a different set of interpretations, based on frequency and proportionality. The problem with context is that numbers do not help understand the semantics and fall short to convey the message of the text. The graphical representation of text semantics focuses on the conversion of text to images. Contrarily to word clouds that simply produce frequency mapping of words within the text and topic models that essentially give context to word frequencies and proportionalities, images keep intact the semantic and the context of the words in the text. They provide a deeper understanding and can be better interpreted. Models such as AttnGAN already exist to convert text into images with a certain level of success, but there has not been work done concerning the conversion of long and complex texts in an image or a set of images. The goal of this analysis is to first, provide an understanding of how we divide the text in bits that improve the resulting image and how does the summarization methodology affect the image result

    Evaluation Measures for Text Summarization

    Get PDF
    We explain the ideas of automatic text summarization approaches and the taxonomy of summary evaluation methods. Moreover, we propose a new evaluation measure for assessing the quality of a summary. The core of the measure is covered by Latent Semantic Analysis (LSA) which can capture the main topics of a document. The summarization systems are ranked according to the similarity of the main topics of their summaries and their reference documents. Results show a high correlation between human rankings and the LSA-based evaluation measure. The measure is designed to compare a summary with its full text. It can compare a summary with a human written abstract as well; however, in this case using a standard ROUGE measure gives more precise results. Nevertheless, if abstracts are not available for a given corpus, using the LSA-based measure is an appropriate choice
    • …
    corecore