1 research outputs found
LlamaFur: Learning Latent Category Matrix to Find Unexpected Relations in Wikipedia
Besides finding trends and unveiling typical patterns, modern information
retrieval is increasingly more interested in the discovery of surprising
information in textual datasets. In this work we focus on finding "unexpected
links" in hyperlinked document corpora when documents are assigned to
categories. To achieve this goal, we model the hyperlinks graph through node
categories: the presence of an arc is fostered or discouraged by the categories
of the head and the tail of the arc. Specifically, we determine a latent
category matrix that explains common links. The matrix is built using a
margin-based online learning algorithm (Passive-Aggressive), which makes us
able to process graphs with links in less than minutes. We show
that our method provides better accuracy than most existing text-based
techniques, with higher efficiency and relying on a much smaller amount of
information. It also provides higher precision than standard link prediction,
especially at low recall levels; the two methods are in fact shown to be
orthogonal to each other and can therefore be fruitfully combined.Comment: Short version appeared in Proc. WebSci '16, May 22-25, 2016,
Hannover, German