39 research outputs found
Simple to Complex Cross-modal Learning to Rank
The heterogeneity-gap between different modalities brings a significant
challenge to multimedia information retrieval. Some studies formalize the
cross-modal retrieval tasks as a ranking problem and learn a shared multi-modal
embedding space to measure the cross-modality similarity. However, previous
methods often establish the shared embedding space based on linear mapping
functions which might not be sophisticated enough to reveal more complicated
inter-modal correspondences. Additionally, current studies assume that the
rankings are of equal importance, and thus all rankings are used
simultaneously, or a small number of rankings are selected randomly to train
the embedding space at each iteration. Such strategies, however, always suffer
from outliers as well as reduced generalization capability due to their lack of
insightful understanding of procedure of human cognition. In this paper, we
involve the self-paced learning theory with diversity into the cross-modal
learning to rank and learn an optimal multi-modal embedding space based on
non-linear mapping functions. This strategy enhances the model's robustness to
outliers and achieves better generalization via training the model gradually
from easy rankings by diverse queries to more complex ones. An efficient
alternative algorithm is exploited to solve the proposed challenging problem
with fast convergence in practice. Extensive experimental results on several
benchmark datasets indicate that the proposed method achieves significant
improvements over the state-of-the-arts in this literature.Comment: 14 pages; Accepted by Computer Vision and Image Understandin
Noisy Correspondence Learning with Meta Similarity Correction
Despite the success of multimodal learning in cross-modal retrieval task, the
remarkable progress relies on the correct correspondence among multimedia data.
However, collecting such ideal data is expensive and time-consuming. In
practice, most widely used datasets are harvested from the Internet and
inevitably contain mismatched pairs. Training on such noisy correspondence
datasets causes performance degradation because the cross-modal retrieval
methods can wrongly enforce the mismatched data to be similar. To tackle this
problem, we propose a Meta Similarity Correction Network (MSCN) to provide
reliable similarity scores. We view a binary classification task as the
meta-process that encourages the MSCN to learn discrimination from positive and
negative meta-data. To further alleviate the influence of noise, we design an
effective data purification strategy using meta-data as prior knowledge to
remove the noisy samples. Extensive experiments are conducted to demonstrate
the strengths of our method in both synthetic and real-world noises, including
Flickr30K, MS-COCO, and Conceptual Captions.Comment: Accepted at CVPR 202
KCD: Knowledge Walks and Textual Cues Enhanced Political Perspective Detection in News Media
Political perspective detection has become an increasingly important task
that can help combat echo chambers and political polarization. Previous
approaches generally focus on leveraging textual content to identify stances,
while they fail to reason with background knowledge or leverage the rich
semantic and syntactic textual labels in news articles. In light of these
limitations, we propose KCD, a political perspective detection approach to
enable multi-hop knowledge reasoning and incorporate textual cues as
paragraph-level labels. Specifically, we firstly generate random walks on
external knowledge graphs and infuse them with news text representations. We
then construct a heterogeneous information network to jointly model news
content as well as semantic, syntactic and entity cues in news articles.
Finally, we adopt relational graph neural networks for graph-level
representation learning and conduct political perspective detection. Extensive
experiments demonstrate that our approach outperforms state-of-the-art methods
on two benchmark datasets. We further examine the effect of knowledge walks and
textual cues and how they contribute to our approach's data efficiency.Comment: accepted at NAACL 2022 main conferenc
GADY: Unsupervised Anomaly Detection on Dynamic Graphs
Anomaly detection on dynamic graphs refers to detecting entities whose
behaviors obviously deviate from the norms observed within graphs and their
temporal information. This field has drawn increasing attention due to its
application in finance, network security, social networks, and more. However,
existing methods face two challenges: dynamic structure constructing challenge
- difficulties in capturing graph structure with complex time information and
negative sampling challenge - unable to construct excellent negative samples
for unsupervised learning. To address these challenges, we propose Unsupervised
Generative Anomaly Detection on Dynamic Graphs (GADY). To tackle the first
challenge, we propose a continuous dynamic graph model to capture the
fine-grained information, which breaks the limit of existing discrete methods.
Specifically, we employ a message-passing framework combined with positional
features to get edge embeddings, which are decoded to identify anomalies. For
the second challenge, we pioneer the use of Generative Adversarial Networks to
generate negative interactions. Moreover, we design a loss function to alter
the training goal of the generator while ensuring the diversity and quality of
generated samples. Extensive experiments demonstrate that our proposed GADY
significantly outperforms the previous state-of-the-art method on three
real-world datasets. Supplementary experiments further validate the
effectiveness of our model design and the necessity of each module