255,614 research outputs found
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
Graph is an important data representation which appears in a wide diversity
of real-world scenarios. Effective graph analytics provides users a deeper
understanding of what is behind the data, and thus can benefit a lot of useful
applications such as node classification, node recommendation, link prediction,
etc. However, most graph analytics methods suffer the high computation and
space cost. Graph embedding is an effective yet efficient way to solve the
graph analytics problem. It converts the graph data into a low dimensional
space in which the graph structural information and graph properties are
maximally preserved. In this survey, we conduct a comprehensive review of the
literature in graph embedding. We first introduce the formal definition of
graph embedding as well as the related concepts. After that, we propose two
taxonomies of graph embedding which correspond to what challenges exist in
different graph embedding problem settings and how the existing work address
these challenges in their solutions. Finally, we summarize the applications
that graph embedding enables and suggest four promising future research
directions in terms of computation efficiency, problem settings, techniques and
application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over
150+ papers till year 2018. It provides systematic categorization of
problems, techniques and applications. Accepted by IEEE Transactions on
Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed
for continuously improving this surve
Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction
In unsupervised learning, there is no apparent straightforward cost function
that can capture the significant factors of variations and similarities. Since
natural systems have smooth dynamics, an opportunity is lost if an unsupervised
objective function remains static during the training process. The absence of
concrete supervision suggests that smooth dynamics should be integrated.
Compared to classical static cost functions, dynamic objective functions allow
to better make use of the gradual and uncertain knowledge acquired through
pseudo-supervision. In this paper, we propose Dynamic Autoencoder (DynAE), a
novel model for deep clustering that overcomes a clustering-reconstruction
trade-off, by gradually and smoothly eliminating the reconstruction objective
function in favor of a construction one. Experimental evaluations on benchmark
datasets show that our approach achieves state-of-the-art results compared to
the most relevant deep clustering methods
Kernelized LRR on Grassmann Manifolds for Subspace Clustering
Low rank representation (LRR) has recently attracted great interest due to
its pleasing efficacy in exploring low-dimensional sub- space structures
embedded in data. One of its successful applications is subspace clustering, by
which data are clustered according to the subspaces they belong to. In this
paper, at a higher level, we intend to cluster subspaces into classes of
subspaces. This is naturally described as a clustering problem on Grassmann
manifold. The novelty of this paper is to generalize LRR on Euclidean space
onto an LRR model on Grassmann manifold in a uniform kernelized LRR framework.
The new method has many applications in data analysis in computer vision tasks.
The proposed models have been evaluated on a number of practical data analysis
applications. The experimental results show that the proposed models outperform
a number of state-of-the-art subspace clustering methods
Visualizing Natural Language Descriptions: A Survey
A natural language interface exploits the conceptual simplicity and
naturalness of the language to create a high-level user-friendly communication
channel between humans and machines. One of the promising applications of such
interfaces is generating visual interpretations of semantic content of a given
natural language that can be then visualized either as a static scene or a
dynamic animation. This survey discusses requirements and challenges of
developing such systems and reports 26 graphical systems that exploit natural
language interfaces and addresses both artificial intelligence and
visualization aspects. This work serves as a frame of reference to researchers
and to enable further advances in the field.Comment: Due to copyright most of the figures only appear in the journal
versio
Transductive Zero-Shot Learning with a Self-training dictionary approach
As an important and challenging problem in computer vision, zero-shot
learning (ZSL) aims at automatically recognizing the instances from unseen
object classes without training data. To address this problem, ZSL is usually
carried out in the following two aspects: 1) capturing the domain distribution
connections between seen classes data and unseen classes data; and 2) modeling
the semantic interactions between the image feature space and the label
embedding space. Motivated by these observations, we propose a bidirectional
mapping based semantic relationship modeling scheme that seeks for crossmodal
knowledge transfer by simultaneously projecting the image features and label
embeddings into a common latent space. Namely, we have a bidirectional
connection relationship that takes place from the image feature space to the
latent space as well as from the label embedding space to the latent space. To
deal with the domain shift problem, we further present a transductive learning
approach that formulates the class prediction problem in an iterative refining
process, where the object classification capacity is progressively reinforced
through bootstrapping-based model updating over highly reliable instances.
Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate
the effectiveness of the proposed approach against the state-of-the-art
approaches
Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
Visual Question Answering (VQA) has attracted much attention since it offers
insight into the relationships between the multi-modal analysis of images and
natural language. Most of the current algorithms are incapable of answering
open-domain questions that require to perform reasoning beyond the image
contents. To address this issue, we propose a novel framework which endows the
model capabilities in answering more complex questions by leveraging massive
external knowledge with dynamic memory networks. Specifically, the questions
along with the corresponding images trigger a process to retrieve the relevant
information in external knowledge bases, which are embedded into a continuous
vector space by preserving the entity-relation structures. Afterwards, we
employ dynamic memory networks to attend to the large body of facts in the
knowledge graph and images, and then perform reasoning over these facts to
generate corresponding answers. Extensive experiments demonstrate that our
model not only achieves the state-of-the-art performance in the visual question
answering task, but can also answer open-domain questions effectively by
leveraging the external knowledge
Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Search is at the heart of modern e-commerce. As a result, the task of ranking
search results automatically (learning to rank) is a multibillion dollar
machine learning problem. Traditional models optimize over a few
hand-constructed features based on the item's text. In this paper, we introduce
a multimodal learning to rank model that combines these traditional features
with visual semantic features transferred from a deep convolutional neural
network. In a large scale experiment using data from the online marketplace
Etsy, we verify that moving to a multimodal representation significantly
improves ranking quality. We show how image features can capture fine-grained
style information not available in a text-only representation. In addition, we
show concrete examples of how image information can successfully disentangle
pairs of highly different items that are ranked similarly by a text-only model.Comment: 9 pages, 6 figure
Knowledge Graph Embeddings and Explainable AI
Knowledge graph embeddings are now a widely adopted approach to knowledge
representation in which entities and relationships are embedded in vector
spaces. In this chapter, we introduce the reader to the concept of knowledge
graph embeddings by explaining what they are, how they can be generated and how
they can be evaluated. We summarize the state-of-the-art in this field by
describing the approaches that have been introduced to represent knowledge in
the vector space. In relation to knowledge representation, we consider the
problem of explainability, and discuss models and methods for explaining
predictions obtained via knowledge graph embeddings.Comment: Federico Bianchi, Gaetano Rossiello, Luca Costabello, Matteo
Plamonari, Pasquale Minervini, Knowledge Graph Embeddings and Explainable AI.
In: Ilaria Tiddi, Freddy Lecue, Pascal Hitzler (eds.), Knowledge Graphs for
eXplainable AI -- Foundations, Applications and Challenges. Studies on the
Semantic Web, IOS Press, Amsterdam, 202
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
JECL: Joint Embedding and Cluster Learning for Image-Text Pairs
We propose JECL, a method for clustering image-caption pairs by training
parallel encoders with regularized clustering and alignment objectives,
simultaneously learning both representations and cluster assignments. These
image-caption pairs arise frequently in high-value applications where
structured training data is expensive to produce, but free-text descriptions
are common. JECL trains by minimizing the Kullback-Leibler divergence between
the distribution of the images and text to that of a combined joint target
distribution and optimizing the Jensen-Shannon divergence between the soft
cluster assignments of the images and text. Regularizers are also applied to
JECL to prevent trivial solutions. Experiments show that JECL outperforms both
single-view and multi-view methods on large benchmark image-caption datasets,
and is remarkably robust to missing captions and varying data sizes
- …