Time-sync comment (TSC) is a new form of user-interaction review associated
with real-time video contents, which contains a user's preferences for videos
and therefore well suited as the data source for video recommendations.
However, existing review-based recommendation methods ignore the
context-dependent (generated by user-interaction), real-time, and
time-sensitive properties of TSC data. To bridge the above gaps, in this paper,
we use video images and users' TSCs to design an Image-Text Fusion model with a
novel Herding Effect Attention mechanism (called ITF-HEA), which can predict
users' favorite videos with model-based collaborative filtering. Specifically,
in the HEA mechanism, we weight the context information based on the semantic
similarities and time intervals between each TSC and its context, thereby
considering influences of the herding effect in the model. Experiments show
that ITF-HEA is on average 3.78\% higher than the state-of-the-art method upon
F1-score in baselines.Comment: ACCEPTED for ORAL presentation at IEEE ICME 201