44,589 research outputs found
Test Model for Text Categorization and Text Summarization
Text Categorization is the task of automatically sorting a set of documents
into categories from a predefined set and Text Summarization is a brief and
accurate representation of input text such that the output covers the most
important concepts of the source in a condensed manner. Document Summarization
is an emerging technique for understanding the main purpose of any kind of
documents. This paper presents a model that uses text categorization and text
summarization for searching a document based on user query.Comment: Pages: 07 Figures : 0
An Introduction to the Summarization of Evolving Events: Linear and Non-linear Evolution
This paper examines the summarization of events that evolve through time. It
discusses different types of evolution taking into account the time in which
the incidents of an event are happening and the different sources reporting on
the specific event. It proposes an approach for multi-document summarization
which employs ``messages'' for representing the incidents of an event and
cross-document relations that hold between messages according to certain
conditions. The paper also outlines the current version of the summarization
system we are implementing to realize this approach.Comment: 10 pages, 3 figures. To be pulished in Natural Language Understanding
and Cognitive Science (NLUCS - 2005) conferenc
A Simple Theoretical Model of Importance for Summarization
Research on summarization has mainly been driven by empirical approaches,
crafting systems to perform well on standard datasets with the notion of
information Importance remaining latent. We argue that establishing theoretical
models of Importance will advance our understanding of the task and help to
further improve summarization systems. To this end, we propose simple but
rigorous definitions of several concepts that were previously used only
intuitively in summarization: Redundancy, Relevance, and Informativeness.
Importance arises as a single quantity naturally unifying these concepts.
Additionally, we provide intuitions to interpret the proposed quantities and
experiments to demonstrate the potential of the framework to inform and guide
subsequent works.Comment: Accepted at ACL19 (outstanding paper award
ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets
Video consumption is being shifted from sit-and-watch to selective skimming.
Existing video player interfaces, however, only provide indirect manipulation
to support this emerging behavior. Video summarization alleviates this issue to
some extent, shortening a video based on the desired length of a summary as an
input variable. But an optimal length of a summarized video is often not
available in advance. Moreover, the user cannot edit the summary once it is
produced, limiting its practical applications. We argue that video
summarization should be an interactive, mixed-initiative process in which users
have control over the summarization procedure while algorithms help users
achieve their goal via video understanding. In this paper, we introduce
ElasticPlay, a mixed-initiative approach that combines an advanced video
summarization technique with direct interface manipulation to help users
control the video summarization process. Users can specify a time budget for
the remaining content while watching a video; our system then immediately
updates the playback plan using our proposed cut-and-forward algorithm,
determining which parts to skip or to fast-forward. This interactive process
allows users to fine-tune the summarization result with immediate feedback. We
show that our system outperforms existing video summarization techniques on the
TVSum50 dataset. We also report two lab studies (22 participants) and a
Mechanical Turk deployment study (60 participants), and show that the
participants responded favorably to ElasticPlay.Comment: ACM Multimedia 2017 preprin
Graph-based Ontology Summarization: A Survey
Ontologies have been widely used in numerous and varied applications, e.g.,
to support data modeling, information integration, and knowledge management.
With the increasing size of ontologies, ontology understanding, which is
playing an important role in different tasks, is becoming more difficult.
Consequently, ontology summarization, as a way to distill key information from
an ontology and generate an abridged version to facilitate a better
understanding, is getting growing attention. In this survey paper, we review
existing ontology summarization techniques and focus mainly on graph-based
methods, which represent an ontology as a graph and apply centrality-based and
other measures to identify the most important elements of an ontology as its
summary. After analyzing their strengths and weaknesses, we highlight a few
potential directions for future research
Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization
Increasingly modern data science platforms today have non-intrusive and
extensible provenance ingestion mechanisms to collect rich provenance and
context information, handle modifications to the same file using
distinguishable versions, and use graph data models (e.g., property graphs) and
query languages (e.g., Cypher) to represent and manipulate the stored
provenance/context information. Due to the schema-later nature of the metadata,
multiple versions of the same files, and unfamiliar artifacts introduced by
team members, the "provenance graph" is verbose and evolving, and hard to
understand; using standard graph query model, it is difficult to compose
queries and utilize this valuable information.
In this paper, we propose two high-level graph query operators to address the
verboseness and evolving nature of such provenance graphs. First, we introduce
a graph segmentation operator, which queries the retrospective provenance
between a set of source vertices and a set of destination vertices via flexible
boundary criteria to help users get insight about the derivation relationships
among those vertices. We show the semantics of such a query in terms of a
context-free grammar, and develop efficient algorithms that run orders of
magnitude faster than state-of-the-art. Second, we propose a graph
summarization operator that combines similar segments together to query
prospective provenance of the underlying project. The operator allows tuning
the summary by ignoring vertex details and characterizing local structures, and
ensures the provenance meaning using path constraints. We show the optimal
summary problem is PSPACE-complete and develop effective approximation
algorithms. The operators are implemented on top of a property graph backend.
We evaluate our query methods extensively and show the effectiveness and
efficiency of the proposed methods
Dimensionality on Summarization
Summarization is one of the key features of human intelligence. It plays an
important role in understanding and representation. With rapid and continual
expansion of texts, pictures and videos in cyberspace, automatic summarization
becomes more and more desirable. Text summarization has been studied for over
half century, but it is still hard to automatically generate a satisfied
summary. Traditional methods process texts empirically and neglect the
fundamental characteristics and principles of language use and understanding.
This paper summarizes previous text summarization approaches in a
multi-dimensional classification space, introduces a multi-dimensional
methodology for research and development, unveils the basic characteristics and
principles of language use and understanding, investigates some fundamental
mechanisms of summarization, studies the dimensions and forms of
representations, and proposes a multi-dimensional evaluation mechanisms.
Investigation extends to the incorporation of pictures into summary and to the
summarization of videos, graphs and pictures, and then reaches a general
summarization framework
Robust Neural Abstractive Summarization Systems and Evaluation against Adversarial Information
Sequence-to-sequence (seq2seq) neural models have been actively investigated
for abstractive summarization. Nevertheless, existing neural abstractive
systems frequently generate factually incorrect summaries and are vulnerable to
adversarial information, suggesting a crucial lack of semantic understanding.
In this paper, we propose a novel semantic-aware neural abstractive
summarization model that learns to generate high quality summaries through
semantic interpretation over salient content. A novel evaluation scheme with
adversarial samples is introduced to measure how well a model identifies
off-topic information, where our model yields significantly better performance
than the popular pointer-generator summarizer. Human evaluation also confirms
that our system summaries are uniformly more informative and faithful as well
as less redundant than the seq2seq model
Text Summarization using Abstract Meaning Representation
With an ever increasing size of text present on the Internet, automatic
summary generation remains an important problem for natural language
understanding. In this work we explore a novel full-fledged pipeline for text
summarization with an intermediate step of Abstract Meaning Representation
(AMR). The pipeline proposed by us first generates an AMR graph of an input
story, through which it extracts a summary graph and finally, generate summary
sentences from this summary graph. Our proposed method achieves
state-of-the-art results compared to the other text summarization routines
based on AMR. We also point out some significant problems in the existing
evaluation methods, which make them unsuitable for evaluating summary quality.Comment: 10 pages, 4 figures, Update: Added more results , corrected figures
and table
Pre-trained Language Model Representations for Language Generation
Pre-trained language model representations have been successful in a wide
range of language understanding tasks. In this paper, we examine different
strategies to integrate pre-trained representations into sequence to sequence
models and apply it to neural machine translation and abstractive
summarization. We find that pre-trained representations are most effective when
added to the encoder network which slows inference by only 14%. Our experiments
in machine translation show gains of up to 5.3 BLEU in a simulated
resource-poor setup. While returns diminish with more labeled data, we still
observe improvements when millions of sentence-pairs are available. Finally, on
abstractive summarization we achieve a new state of the art on the full text
version of CNN/DailyMail.Comment: NAACL 201
- …