Although fully generative models have been successfully used to model the
contents of text documents, they are often awkward to apply to combinations of
text data and document metadata. In this paper we propose a
Dirichlet-multinomial regression (DMR) topic model that includes a log-linear
prior on document-topic distributions that is a function of observed features
of the document, such as author, publication venue, references, and dates. We
show that by selecting appropriate features, DMR topic models can meet or
exceed the performance of several previously published topic models designed
for specific data.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008