18,499 research outputs found

    A Novel Document Generation Process for Topic Detection based on Hierarchical Latent Tree Models

    Full text link
    We propose a novel document generation process based on hierarchical latent tree models (HLTMs) learned from data. An HLTM has a layer of observed word variables at the bottom and multiple layers of latent variables on top. For each document, we first sample values for the latent variables layer by layer via logic sampling, then draw relative frequencies for the words conditioned on the values of the latent variables, and finally generate words for the document using the relative word frequencies. The motivation for the work is to take word counts into consideration with HLTMs. In comparison with LDA-based hierarchical document generation processes, the new process achieves drastically better model fit with much fewer parameters. It also yields more meaningful topics and topic hierarchies. It is the new state-of-the-art for the hierarchical topic detection

    Conformative Filtering for Implicit Feedback Data

    Full text link
    Implicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and is domain independent. However, there is a lack of negative examples. Previous work tackles this problem by assuming that users are not interested or not as much interested in the unconsumed items. Those assumptions are often severely violated since non-consumption can be due to factors like unawareness or lack of resources. Therefore, non-consumption by a user does not always mean disinterest or irrelevance. In this paper, we propose a novel method called Conformative Filtering (CoF) to address the issue. The motivating observation is that if there is a large group of users who share the same taste and none of them have consumed an item before, then it is likely that the item is not of interest to the group. We perform multidimensional clustering on implicit feedback data using hierarchical latent tree analysis (HLTA) to identify user `tastes' groups and make recommendations for a user based on her memberships in the groups and on the past behavior of the groups. Experiments on two real-world datasets from different domains show that CoF has superior performance compared to several common baselines
    • …
    corecore