Previous researches on multimedia fake news detection include a series of
complex feature extraction and fusion networks to gather useful information
from the news. However, how cross-modal consistency relates to the fidelity of
news and how features from different modalities affect the decision-making are
still open questions. This paper presents a novel scheme of Bootstrapping
Multi-view Representations (BMR) for fake news detection. Given a multi-modal
news, we extract representations respectively from the views of the text, the
image pattern and the image semantics. Improved Multi-gate Mixture-of-Expert
networks (iMMoE) are proposed for feature refinement and fusion.
Representations from each view are separately used to coarsely predict the
fidelity of the whole news, and the multimodal representations are able to
predict the cross-modal consistency. With the prediction scores, we reweigh
each view of the representations and bootstrap them for fake news detection.
Extensive experiments conducted on typical fake news detection datasets prove
that the proposed BMR outperforms state-of-the-art schemes.Comment: Authors are from Fudan University, China. Under Revie