28,972 research outputs found
KBGAN: Adversarial Learning for Knowledge Graph Embeddings
We introduce KBGAN, an adversarial learning framework to improve the
performances of a wide range of existing knowledge graph embedding models.
Because knowledge graphs typically only contain positive facts, sampling useful
negative training examples is a non-trivial task. Replacing the head or tail
entity of a fact with a uniformly randomly selected entity is a conventional
method for generating negative facts, but the majority of the generated
negative facts can be easily discriminated from positive facts, and will
contribute little towards the training. Inspired by generative adversarial
networks (GANs), we use one knowledge graph embedding model as a negative
sample generator to assist the training of our desired model, which acts as the
discriminator in GANs. This framework is independent of the concrete form of
generator and discriminator, and therefore can utilize a wide variety of
knowledge graph embedding models as its building blocks. In experiments, we
adversarially train two translation-based models, TransE and TransD, each with
assistance from one of the two probability-based models, DistMult and ComplEx.
We evaluate the performances of KBGAN on the link prediction task, using three
knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental
results show that adversarial training substantially improves the performances
of target embedding models under various settings.Comment: To appear at NAACL HLT 201
Sampled in Pairs and Driven by Text: A New Graph Embedding Framework
In graphs with rich texts, incorporating textual information with structural
information would benefit constructing expressive graph embeddings. Among
various graph embedding models, random walk (RW)-based is one of the most
popular and successful groups. However, it is challenged by two issues when
applied on graphs with rich texts: (i) sampling efficiency: deriving from the
training objective of RW-based models (e.g., DeepWalk and node2vec), we show
that RW-based models are likely to generate large amounts of redundant training
samples due to three main drawbacks. (ii) text utilization: these models have
difficulty in dealing with zero-shot scenarios where graph embedding models
have to infer graph structures directly from texts. To solve these problems, we
propose a novel framework, namely Text-driven Graph Embedding with Pairs
Sampling (TGE-PS). TGE-PS uses Pairs Sampling (PS) to improve the sampling
strategy of RW, being able to reduce ~99% training samples while preserving
competitive performance. TGE-PS uses Text-driven Graph Embedding (TGE), an
inductive graph embedding approach, to generate node embeddings from texts.
Since each node contains rich texts, TGE is able to generate high-quality
embeddings and provide reasonable predictions on existence of links to unseen
nodes. We evaluate TGE-PS on several real-world datasets, and experiment
results demonstrate that TGE-PS produces state-of-the-art results on both
traditional and zero-shot link prediction tasks.Comment: Accepted by WWW 2019 (The World Wide Web Conference. ACM, 2019
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Recent advancements in deep neural networks for graph-structured data have
led to state-of-the-art performance on recommender system benchmarks. However,
making these methods practical and scalable to web-scale recommendation tasks
with billions of items and hundreds of millions of users remains a challenge.
Here we describe a large-scale deep recommendation engine that we developed and
deployed at Pinterest. We develop a data-efficient Graph Convolutional Network
(GCN) algorithm PinSage, which combines efficient random walks and graph
convolutions to generate embeddings of nodes (i.e., items) that incorporate
both graph structure as well as node feature information. Compared to prior GCN
approaches, we develop a novel method based on highly efficient random walks to
structure the convolutions and design a novel training strategy that relies on
harder-and-harder training examples to improve robustness and convergence of
the model. We also develop an efficient MapReduce model inference algorithm to
generate embeddings using a trained model. We deploy PinSage at Pinterest and
train it on 7.5 billion examples on a graph with 3 billion nodes representing
pins and boards, and 18 billion edges. According to offline metrics, user
studies and A/B tests, PinSage generates higher-quality recommendations than
comparable deep learning and graph-based alternatives. To our knowledge, this
is the largest application of deep graph embeddings to date and paves the way
for a new generation of web-scale recommender systems based on graph
convolutional architectures.Comment: KDD 201
Interaction Embeddings for Prediction and Explanation in Knowledge Graphs
Knowledge graph embedding aims to learn distributed representations for
entities and relations, and is proven to be effective in many applications.
Crossover interactions --- bi-directional effects between entities and
relations --- help select related information when predicting a new triple, but
haven't been formally discussed before. In this paper, we propose CrossE, a
novel knowledge graph embedding which explicitly simulates crossover
interactions. It not only learns one general embedding for each entity and
relation as most previous methods do, but also generates multiple triple
specific embeddings for both of them, named interaction embeddings. We evaluate
embeddings on typical link prediction tasks and find that CrossE achieves
state-of-the-art results on complex and more challenging datasets. Furthermore,
we evaluate embeddings from a new perspective --- giving explanations for
predicted triples, which is important for real applications. In this work, an
explanation for a triple is regarded as a reliable closed-path between the head
and the tail entity. Compared to other baselines, we show experimentally that
CrossE, benefiting from interaction embeddings, is more capable of generating
reliable explanations to support its predictions.Comment: This paper is accepted by WSDM201
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
An Interpretable Knowledge Transfer Model for Knowledge Base Completion
Knowledge bases are important resources for a variety of natural language
processing tasks but suffer from incompleteness. We propose a novel embedding
model, \emph{ITransF}, to perform knowledge base completion. Equipped with a
sparse attention mechanism, ITransF discovers hidden concepts of relations and
transfer statistical strength through the sharing of concepts. Moreover, the
learned associations between relations and concepts, which are represented by
sparse attention vectors, can be interpreted easily. We evaluate ITransF on two
benchmark datasets---WN18 and FB15k for knowledge base completion and obtains
improvements on both the mean rank and Hits@10 metrics, over all baselines that
do not use additional information.Comment: Accepted by ACL 2017. Minor updat
Interpreting Embedding Models of Knowledge Bases: A Pedagogical Approach
Knowledge bases are employed in a variety of applications from natural
language processing to semantic web search; alas, in practice their usefulness
is hurt by their incompleteness. Embedding models attain state-of-the-art
accuracy in knowledge base completion, but their predictions are notoriously
hard to interpret. In this paper, we adapt "pedagogical approaches" (from the
literature on neural networks) so as to interpret embedding models by
extracting weighted Horn rules from them. We show how pedagogical approaches
have to be adapted to take upon the large-scale relational aspects of knowledge
bases and show experimentally their strengths and weaknesses.Comment: presented at 2018 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2018), Stockholm, Swede
- …