7,677 research outputs found
Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection
The explosive growth of rumors with text and images on social media platforms
has drawn great attention. Existing studies have made significant contributions
to cross-modal information interaction and fusion, but they fail to fully
explore hierarchical and complex semantic correlation across different modality
content, severely limiting their performance on detecting multi-modal rumor. In
this work, we propose a novel knowledge-enhanced hierarchical information
correlation learning approach (KhiCL) for multi-modal rumor detection by
jointly modeling the basic semantic correlation and high-order
knowledge-enhanced entity correlation. Specifically, KhiCL exploits cross-modal
joint dictionary to transfer the heterogeneous unimodality features into the
common feature space and captures the basic cross-modal semantic consistency
and inconsistency by a cross-modal fusion layer. Moreover, considering the
description of multi-modal content is narrated around entities, KhiCL extracts
visual and textual entities from images and text, and designs a knowledge
relevance reasoning strategy to find the shortest semantic relevant path
between each pair of entities in external knowledge graph, and absorbs all
complementary contextual knowledge of other connected entities in this path for
learning knowledge-enhanced entity representations. Furthermore, KhiCL utilizes
a signed attention mechanism to model the knowledge-enhanced entity consistency
and inconsistency of intra-modality and inter-modality entity pairs by
measuring their corresponding semantic relevant distance. Extensive experiments
have demonstrated the effectiveness of the proposed method
Visual Entity Linking: A Preliminary Study
In this paper, we describe a system that jointly extracts entities appearing in images and mentioned in their ac- companying captions. As input, the entity linking pro- gram takes a segmented image together with its cap- tion. It consists of a sequence of processing steps: part- of-speech tagging, dependency parsing, and coreference resolution that enables us to identify the entities as well as possible textual relations from the captions. The pro- gram uses the image regions labelled with a set of pre- defined categories and computes WordNet similarities between these labels and the entity names. Finally, the program links the entities it detected across the text and the images. We applied our system on the Segmented and Annotated IAPR TC-12 dataset that we enriched with entity annotations and we obtained a correct as- signment rate of 55.48
- …