Large-scale dissemination of disinformation online intended to mislead or
deceive the general population is a major societal problem. Rapid progression
in image, video, and natural language generative models has only exacerbated
this situation and intensified our need for an effective defense mechanism.
While existing approaches have been proposed to defend against neural fake
news, they are generally constrained to the very limited setting where articles
only have text and metadata such as the title and authors. In this paper, we
introduce the more realistic and challenging task of defending against
machine-generated news that also includes images and captions. To identify the
possible weaknesses that adversaries can exploit, we create a NeuralNews
dataset composed of 4 different types of generated articles as well as conduct
a series of human user study experiments based on this dataset. In addition to
the valuable insights gleaned from our user study experiments, we provide a
relatively effective approach based on detecting visual-semantic
inconsistencies, which will serve as an effective first line of defense and a
useful reference for future work in defending against machine-generated
disinformation.Comment: Accepted at EMNLP 202