1 research outputs found
Analysis and tuning of hierarchical topic models based on Renyi entropy approach
Hierarchical topic modeling is a potentially powerful instrument for
determining the topical structure of text collections that allows constructing
a topical hierarchy representing levels of topical abstraction. However, tuning
of parameters of hierarchical models, including the number of topics on each
hierarchical level, remains a challenging task and an open issue. In this
paper, we propose a Renyi entropy-based approach for a partial solution to the
above problem. First, we propose a Renyi entropy-based metric of quality for
hierarchical models. Second, we propose a practical concept of hierarchical
topic model tuning tested on datasets with human mark-up. In the numerical
experiments, we consider three different hierarchical models, namely,
hierarchical latent Dirichlet allocation (hLDA) model, hierarchical Pachinko
allocation model (hPAM), and hierarchical additive regularization of topic
models (hARTM). We demonstrate that hLDA model possesses a significant level of
instability and, moreover, the derived numbers of topics are far away from the
true numbers for labeled datasets. For hPAM model, the Renyi entropy approach
allows us to determine only one level of the data structure. For hARTM model,
the proposed approach allows us to estimate the number of topics for two
hierarchical levels