Search CORE

44,589 research outputs found

Test Model for Text Categorization and Text Summarization

Author: Shrawankar Urmila
Thakkar Khushboo
Publication venue
Publication date: 10/05/2013
Field of study

Text Categorization is the task of automatically sorting a set of documents into categories from a predefined set and Text Summarization is a brief and accurate representation of input text such that the output covers the most important concepts of the source in a condensed manner. Document Summarization is an emerging technique for understanding the main purpose of any kind of documents. This paper presents a model that uses text categorization and text summarization for searching a document based on user query.Comment: Pages: 07 Figures : 0

arXiv.org e-Print Archive

An Introduction to the Summarization of Evolving Events: Linear and Non-linear Evolution

Author: Afantenos Stergos D.
Karkaletsis Vangelis
Liontou Konstantina
Salapata Maria
Publication venue
Publication date: 15/03/2005
Field of study

This paper examines the summarization of events that evolve through time. It discusses different types of evolution taking into account the time in which the incidents of an event are happening and the different sources reporting on the specific event. It proposes an approach for multi-document summarization which employs ``messages'' for representing the incidents of an event and cross-document relations that hold between messages according to certain conditions. The paper also outlines the current version of the summarization system we are implementing to realize this approach.Comment: 10 pages, 3 figures. To be pulished in Natural Language Understanding and Cognitive Science (NLUCS - 2005) conferenc

arXiv.org e-Print Archive

A Simple Theoretical Model of Importance for Summarization

Author: Peyrard Maxime
Publication venue
Publication date: 06/08/2019
Field of study

Research on summarization has mainly been driven by empirical approaches, crafting systems to perform well on standard datasets with the notion of information Importance remaining latent. We argue that establishing theoretical models of Importance will advance our understanding of the task and help to further improve summarization systems. To this end, we propose simple but rigorous definitions of several concepts that were previously used only intuitively in summarization: Redundancy, Relevance, and Informativeness. Importance arises as a single quantity naturally unifying these concepts. Additionally, we provide intuitions to interpret the proposed quantities and experiments to demonstrate the potential of the framework to inform and guide subsequent works.Comment: Accepted at ACL19 (outstanding paper award

arXiv.org e-Print Archive

ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets

Author: Jin Haojian
Song Yale
Yatani Koji
Publication venue
Publication date: 22/08/2017
Field of study

Video consumption is being shifted from sit-and-watch to selective skimming. Existing video player interfaces, however, only provide indirect manipulation to support this emerging behavior. Video summarization alleviates this issue to some extent, shortening a video based on the desired length of a summary as an input variable. But an optimal length of a summarized video is often not available in advance. Moreover, the user cannot edit the summary once it is produced, limiting its practical applications. We argue that video summarization should be an interactive, mixed-initiative process in which users have control over the summarization procedure while algorithms help users achieve their goal via video understanding. In this paper, we introduce ElasticPlay, a mixed-initiative approach that combines an advanced video summarization technique with direct interface manipulation to help users control the video summarization process. Users can specify a time budget for the remaining content while watching a video; our system then immediately updates the playback plan using our proposed cut-and-forward algorithm, determining which parts to skip or to fast-forward. This interactive process allows users to fine-tune the summarization result with immediate feedback. We show that our system outperforms existing video summarization techniques on the TVSum50 dataset. We also report two lab studies (22 participants) and a Mechanical Turk deployment study (60 participants), and show that the participants responded favorably to ElasticPlay.Comment: ACM Multimedia 2017 preprin

arXiv.org e-Print Archive

Graph-based Ontology Summarization: A Survey

Author: Allahyari Mehdi
Arabnia Hamid Reza
Cheng Gong
Kochut Krys
Liu Qingxia
Pouriyeh Seyedamin
Qu Yuzhong
Publication venue
Publication date: 15/05/2018
Field of study

Ontologies have been widely used in numerous and varied applications, e.g., to support data modeling, information integration, and knowledge management. With the increasing size of ontologies, ontology understanding, which is playing an important role in different tasks, is becoming more difficult. Consequently, ontology summarization, as a way to distill key information from an ontology and generate an abridged version to facilitate a better understanding, is getting growing attention. In this survey paper, we review existing ontology summarization techniques and focus mainly on graph-based methods, which represent an ontology as a graph and apply centrality-based and other measures to identify the most important elements of an ontology as its summary. After analyzing their strengths and weaknesses, we highlight a few potential directions for future research

arXiv.org e-Print Archive

Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization

Author: Deshpande Amol
Miao Hui
Publication venue
Publication date: 16/10/2018
Field of study

Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the "provenance graph" is verbose and evolving, and hard to understand; using standard graph query model, it is difficult to compose queries and utilize this valuable information. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. The operators are implemented on top of a property graph backend. We evaluate our query methods extensively and show the effectiveness and efficiency of the proposed methods

arXiv.org e-Print Archive

Dimensionality on Summarization

Author: Zhuge Hai
Publication venue
Publication date: 01/07/2015
Field of study

Summarization is one of the key features of human intelligence. It plays an important role in understanding and representation. With rapid and continual expansion of texts, pictures and videos in cyberspace, automatic summarization becomes more and more desirable. Text summarization has been studied for over half century, but it is still hard to automatically generate a satisfied summary. Traditional methods process texts empirically and neglect the fundamental characteristics and principles of language use and understanding. This paper summarizes previous text summarization approaches in a multi-dimensional classification space, introduces a multi-dimensional methodology for research and development, unveils the basic characteristics and principles of language use and understanding, investigates some fundamental mechanisms of summarization, studies the dimensions and forms of representations, and proposes a multi-dimensional evaluation mechanisms. Investigation extends to the incorporation of pictures into summary and to the summarization of videos, graphs and pictures, and then reaches a general summarization framework

arXiv.org e-Print Archive

Robust Neural Abstractive Summarization Systems and Evaluation against Adversarial Information

Author: Fan Lisa
Wang Lu
Yu Dong
Publication venue
Publication date: 14/10/2018
Field of study

Sequence-to-sequence (seq2seq) neural models have been actively investigated for abstractive summarization. Nevertheless, existing neural abstractive systems frequently generate factually incorrect summaries and are vulnerable to adversarial information, suggesting a crucial lack of semantic understanding. In this paper, we propose a novel semantic-aware neural abstractive summarization model that learns to generate high quality summaries through semantic interpretation over salient content. A novel evaluation scheme with adversarial samples is introduced to measure how well a model identifies off-topic information, where our model yields significantly better performance than the popular pointer-generator summarizer. Human evaluation also confirms that our system summaries are uniformly more informative and faithful as well as less redundant than the seq2seq model

arXiv.org e-Print Archive

Text Summarization using Abstract Meaning Representation

Author: Dohare Shibhansh
Gupta Vivek
Karnick Harish
Publication venue
Publication date: 17/07/2017
Field of study

With an ever increasing size of text present on the Internet, automatic summary generation remains an important problem for natural language understanding. In this work we explore a novel full-fledged pipeline for text summarization with an intermediate step of Abstract Meaning Representation (AMR). The pipeline proposed by us first generates an AMR graph of an input story, through which it extracts a summary graph and finally, generate summary sentences from this summary graph. Our proposed method achieves state-of-the-art results compared to the other text summarization routines based on AMR. We also point out some significant problems in the existing evaluation methods, which make them unsuitable for evaluating summary quality.Comment: 10 pages, 4 figures, Update: Added more results , corrected figures and table

arXiv.org e-Print Archive

Pre-trained Language Model Representations for Language Generation

Author: Auli Michael
Baevski Alexei
Edunov Sergey
Publication venue
Publication date: 01/04/2019
Field of study

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.Comment: NAACL 201

arXiv.org e-Print Archive