384 research outputs found
Unsupervised Summarization by Jointly Extracting Sentences and Keywords
We present RepRank, an unsupervised graph-based ranking model for extractive
multi-document summarization in which the similarity between words, sentences,
and word-to-sentence can be estimated by the distances between their vector
representations in a unified vector space. In order to obtain desirable
representations, we propose a self-attention based learning method that
represent a sentence by the weighted sum of its word embeddings, and the
weights are concentrated to those words hopefully better reflecting the content
of a document. We show that salient sentences and keywords can be extracted in
a joint and mutual reinforcement process using our learned representations, and
prove that this process always converges to a unique solution leading to
improvement in performance. A variant of absorbing random walk and the
corresponding sampling-based algorithm are also described to avoid redundancy
and increase diversity in the summaries. Experiment results with multiple
benchmark datasets show that RepRank achieved the best or comparable
performance in ROUGE.Comment: 10 pages(includes 2 pages references), 1 figur
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Most recent approaches use the sequence-to-sequence model for paraphrase
generation. The existing sequence-to-sequence model tends to memorize the words
and the patterns in the training dataset instead of learning the meaning of the
words. Therefore, the generated sentences are often grammatically correct but
semantically improper. In this work, we introduce a novel model based on the
encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our
proposed model generates the words by querying distributed word representations
(i.e. neural word embeddings), hoping to capturing the meaning of the according
words. Following previous work, we evaluate our model on two
paraphrase-oriented tasks, namely text simplification and short text
abstractive summarization. Experimental results show that our model outperforms
the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two
English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a
Chinese summarization dataset. Moreover, our model achieves state-of-the-art
performances on these three benchmark datasets.Comment: arXiv admin note: text overlap with arXiv:1710.0231
Learning from Multiple Sources for Video Summarisation
Many visual surveillance tasks, e.g.video summarisation, is conventionally
accomplished through analysing imagerybased features. Relying solely on visual
cues for public surveillance video understanding is unreliable, since visual
observations obtained from public space CCTV video data are often not
sufficiently trustworthy and events of interest can be subtle. On the other
hand, non-visual data sources such as weather reports and traffic sensory
signals are readily accessible but are not explored jointly to complement
visual data for video content analysis and summarisation. In this paper, we
present a novel unsupervised framework to learn jointly from both visual and
independently-drawn non-visual data sources for discovering meaningful latent
structure of surveillance video data. In particular, we investigate ways to
cope with discrepant dimension and representation whist associating these
heterogeneous data sources, and derive effective mechanism to tolerate with
missing and incomplete data from different sources. We show that the proposed
multi-source learning framework not only achieves better video content
clustering than state-of-the-art methods, but also is capable of accurately
inferring missing non-visual semantics from previously unseen videos. In
addition, a comprehensive user study is conducted to validate the quality of
video summarisation generated using the proposed multi-source model
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Selecting and Generating Computational Meaning Representations for Short Texts
Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use SQL as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-SQL problem from three perspectives—methodology, systems, and applications—and show how each contributes to a fuller understanding of the task.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143967/1/cfdollak_1.pd
- …