1,739 research outputs found
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Graph-based Neural Multi-Document Summarization
We propose a neural multi-document summarization (MDS) system that
incorporates sentence relation graphs. We employ a Graph Convolutional Network
(GCN) on the relation graphs, with sentence embeddings obtained from Recurrent
Neural Networks as input node features. Through multiple layer-wise
propagation, the GCN generates high-level hidden sentence features for salience
estimation. We then use a greedy heuristic to extract salient sentences while
avoiding redundancy. In our experiments on DUC 2004, we consider three types of
sentence relation graphs and demonstrate the advantage of combining sentence
relations in graphs with the representation power of deep neural networks. Our
model improves upon traditional graph-based extractive approaches and the
vanilla GRU sequence model with no graph, and it achieves competitive results
against other state-of-the-art multi-document summarization systems.Comment: In CoNLL 201
- …