Institute of Philosophy of the Czech Academy of Sciences
Doi
Abstract
We present the Visual Topic Model (VTM)—a model able to generate a topic distribution for an image, without using any text during inference. The model is applied to an image-text matching task at MediaEval 2021. Though results for this specific task are negative (the model works worse than a baseline), we demonstrate that VTM produces meaningful results and can be used in other applications