5,049 research outputs found
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Ranking for Relevance and Display Preferences in Complex Presentation Layouts
Learning to Rank has traditionally considered settings where given the
relevance information of objects, the desired order in which to rank the
objects is clear. However, with today's large variety of users and layouts this
is not always the case. In this paper, we consider so-called complex ranking
settings where it is not clear what should be displayed, that is, what the
relevant items are, and how they should be displayed, that is, where the most
relevant items should be placed. These ranking settings are complex as they
involve both traditional ranking and inferring the best display order. Existing
learning to rank methods cannot handle such complex ranking settings as they
assume that the display order is known beforehand. To address this gap we
introduce a novel Deep Reinforcement Learning method that is capable of
learning complex rankings, both the layout and the best ranking given the
layout, from weak reward signals. Our proposed method does so by selecting
documents and positions sequentially, hence it ranks both the documents and
positions, which is why we call it the Double-Rank Model (DRM). Our experiments
show that DRM outperforms all existing methods in complex ranking settings,
thus it leads to substantial ranking improvements in cases where the display
order is not known a priori
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
- …