62,409 research outputs found
Adversarial Attack and Defense on Graph Data: A Survey
Deep neural networks (DNNs) have been widely applied to various applications
including image classification, text generation, audio recognition, and graph
data analysis. However, recent studies have shown that DNNs are vulnerable to
adversarial attacks. Though there are several works studying adversarial attack
and defense strategies on domains such as images and natural language
processing, it is still difficult to directly transfer the learned knowledge to
graph structure data due to its representation challenges. Given the importance
of graph analysis, an increasing number of works start to analyze the
robustness of machine learning models on graph data. Nevertheless, current
studies considering adversarial behaviors on graph data usually focus on
specific types of attacks with certain assumptions. In addition, each work
proposes its own mathematical formulation which makes the comparison among
different methods difficult. Therefore, in this paper, we aim to survey
existing adversarial learning strategies on graph data and first provide a
unified formulation for adversarial learning on graph data which covers most
adversarial learning studies on graph. Moreover, we also compare different
attacks and defenses on graph data and discuss their corresponding
contributions and limitations. In this work, we systemically organize the
considered works based on the features of each topic. This survey not only
serves as a reference for the research community, but also brings a clear image
researchers outside this research domain. Besides, we also create an online
resource and keep updating the relevant papers during the last two years. More
details of the comparisons of various studies based on this survey are
open-sourced at
https://github.com/YingtongDou/graph-adversarial-learning-literature.Comment: In submission to Journal. For more open-source and up-to-date
information, please check our Github repository:
https://github.com/YingtongDou/graph-adversarial-learning-literatur
Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis
User preference profiling is an important task in modern online social
networks (OSN). With the proliferation of image-centric social platforms, such
as Pinterest, visual contents have become one of the most informative data
streams for understanding user preferences. Traditional approaches usually
treat visual content analysis as a general classification problem where one or
more labels are assigned to each image. Although such an approach simplifies
the process of image analysis, it misses the rich context and visual cues that
play an important role in people's perception of images. In this paper, we
explore the possibilities of learning a user's latent visual preferences
directly from image contents. We propose a distance metric learning method
based on Deep Convolutional Neural Networks (CNN) to directly extract
similarity information from visual contents and use the derived distance metric
to mine individual users' fine-grained visual preferences. Through our
preliminary experiments using data from 5,790 Pinterest users, we show that
even for the images within the same category, each user possesses distinct and
individually-identifiable visual preferences that are consistent over their
lifetime. Our results underscore the untapped potential of finer-grained visual
preference profiling in understanding users' preferences.Comment: 2015 IEEE 15th International Conference on Data Mining Workshop
Towards a Principled Integration of Multi-Camera Re-Identification and Tracking through Optimal Bayes Filters
With the rise of end-to-end learning through deep learning, person detectors
and re-identification (ReID) models have recently become very strong.
Multi-camera multi-target (MCMT) tracking has not fully gone through this
transformation yet. We intend to take another step in this direction by
presenting a theoretically principled way of integrating ReID with tracking
formulated as an optimal Bayes filter. This conveniently side-steps the need
for data-association and opens up a direct path from full images to the core of
the tracker. While the results are still sub-par, we believe that this new,
tight integration opens many interesting research opportunities and leads the
way towards full end-to-end tracking from raw pixels.Comment: First two authors have equal contribution. This is initial work into
a new direction, not a benchmark-beating method. v2 only adds
acknowledgements and fixes a typo in e-mai
Multi-modal joint embedding for fashion product retrieval
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem, akin to finding a needle in a haystack. In this paper, we leverage both the images and textual meta-data and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space, which is both efficient and accurate. We train this embedding using large-scale real world e-commerce data by both minimizing the similarity between related products and using auxiliary classification networks to that encourage the embedding to have semantic meaning. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset. We also provide an analysis of the different metadata.Peer ReviewedPostprint (author's final draft
Learning Social Image Embedding with Deep Multimodal Attention Networks
Learning social media data embedding by deep models has attracted extensive
research interest as well as boomed a lot of applications, such as link
prediction, classification, and cross-modal search. However, for social images
which contain both link information and multimodal contents (e.g., text
description, and visual content), simply employing the embedding learnt from
network structure or data content results in sub-optimal social image
representation. In this paper, we propose a novel social image embedding
approach called Deep Multimodal Attention Networks (DMAN), which employs a deep
model to jointly embed multimodal contents and link information. Specifically,
to effectively capture the correlations between multimodal contents, we propose
a multimodal attention network to encode the fine-granularity relation between
image regions and textual words. To leverage the network structure for
embedding learning, a novel Siamese-Triplet neural network is proposed to model
the links among images. With the joint deep model, the learnt embedding can
capture both the multimodal contents and the nonlinear network information.
Extensive experiments are conducted to investigate the effectiveness of our
approach in the applications of multi-label classification and cross-modal
search. Compared to state-of-the-art image embeddings, our proposed DMAN
achieves significant improvement in the tasks of multi-label classification and
cross-modal search
- …