18,499 research outputs found
A Novel Document Generation Process for Topic Detection based on Hierarchical Latent Tree Models
We propose a novel document generation process based on hierarchical latent
tree models (HLTMs) learned from data. An HLTM has a layer of observed word
variables at the bottom and multiple layers of latent variables on top. For
each document, we first sample values for the latent variables layer by layer
via logic sampling, then draw relative frequencies for the words conditioned on
the values of the latent variables, and finally generate words for the document
using the relative word frequencies. The motivation for the work is to take
word counts into consideration with HLTMs. In comparison with LDA-based
hierarchical document generation processes, the new process achieves
drastically better model fit with much fewer parameters. It also yields more
meaningful topics and topic hierarchies. It is the new state-of-the-art for the
hierarchical topic detection
Conformative Filtering for Implicit Feedback Data
Implicit feedback is the simplest form of user feedback that can be used for
item recommendation. It is easy to collect and is domain independent. However,
there is a lack of negative examples. Previous work tackles this problem by
assuming that users are not interested or not as much interested in the
unconsumed items. Those assumptions are often severely violated since
non-consumption can be due to factors like unawareness or lack of resources.
Therefore, non-consumption by a user does not always mean disinterest or
irrelevance. In this paper, we propose a novel method called Conformative
Filtering (CoF) to address the issue. The motivating observation is that if
there is a large group of users who share the same taste and none of them have
consumed an item before, then it is likely that the item is not of interest to
the group. We perform multidimensional clustering on implicit feedback data
using hierarchical latent tree analysis (HLTA) to identify user `tastes' groups
and make recommendations for a user based on her memberships in the groups and
on the past behavior of the groups. Experiments on two real-world datasets from
different domains show that CoF has superior performance compared to several
common baselines
- …