12,452 research outputs found
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Multilayer Networks
In most natural and engineered systems, a set of entities interact with each
other in complicated patterns that can encompass multiple types of
relationships, change in time, and include other types of complications. Such
systems include multiple subsystems and layers of connectivity, and it is
important to take such "multilayer" features into account to try to improve our
understanding of complex systems. Consequently, it is necessary to generalize
"traditional" network theory by developing (and validating) a framework and
associated tools to study multilayer systems in a comprehensive fashion. The
origins of such efforts date back several decades and arose in multiple
disciplines, and now the study of multilayer networks has become one of the
most important directions in network science. In this paper, we discuss the
history of multilayer networks (and related concepts) and review the exploding
body of work on such networks. To unify the disparate terminology in the large
body of recent work, we discuss a general framework for multilayer networks,
construct a dictionary of terminology to relate the numerous existing concepts
to each other, and provide a thorough discussion that compares, contrasts, and
translates between related notions such as multilayer networks, multiplex
networks, interdependent networks, networks of networks, and many others. We
also survey and discuss existing data sets that can be represented as
multilayer networks. We review attempts to generalize single-layer-network
diagnostics to multilayer networks. We also discuss the rapidly expanding
research on multilayer-network models and notions like community structure,
connected components, tensor decompositions, and various types of dynamical
processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure
Representation Learning for Attributed Multiplex Heterogeneous Network
Network embedding (or graph embedding) has been widely used in many
real-world applications. However, existing methods mainly focus on networks
with single-typed nodes/edges and cannot scale well to handle large networks.
Many real-world networks consist of billions of nodes and edges of multiple
types, and each node is associated with different attributes. In this paper, we
formalize the problem of embedding learning for the Attributed Multiplex
Heterogeneous Network and propose a unified framework to address this problem.
The framework supports both transductive and inductive learning. We also give
the theoretical analysis of the proposed framework, showing its connection with
previous works and proving its better expressiveness. We conduct systematical
evaluations for the proposed framework on four different genres of challenging
datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results
demonstrate that with the learned embeddings from the proposed framework, we
can achieve statistically significant improvements (e.g., 5.99-28.23% lift by
F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link
prediction. The framework has also been successfully deployed on the
recommendation system of a worldwide leading e-commerce company, Alibaba Group.
Results of the offline A/B tests on product recommendation further confirm the
effectiveness and efficiency of the framework in practice.Comment: Accepted to KDD 2019. Website: https://sites.google.com/view/gatn
- …