57,822 research outputs found
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Adaptive Partitioning for Large-Scale Dynamic Graphs
Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I
StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices
Given a large-scale graph with millions of nodes and edges, how to reveal
macro patterns of interest, like cliques, bi-partite cores, stars, and chains?
Furthermore, how to visualize such patterns altogether getting insights from
the graph to support wise decision-making? Although there are many algorithmic
and visual techniques to analyze graphs, none of the existing approaches is
able to present the structural information of graphs at large-scale. Hence,
this paper describes StructMatrix, a methodology aimed at high-scalable visual
inspection of graph structures with the goal of revealing macro patterns of
interest. StructMatrix combines algorithmic structure detection and adjacency
matrix visualization to present cardinality, distribution, and relationship
features of the structures found in a given graph. We performed experiments in
real, large-scale graphs with up to one million nodes and millions of edges.
StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and
DBLP) have characterizations that reflect the nature of their corresponding
domains; our findings have not been seen in the literature so far. We expect
that our technique will bring deeper insights into large graph mining,
leveraging their use for decision making.Comment: To appear: 8 pages, paper to be published at the Fifth IEEE ICDM
Workshop on Data Mining in Networks, 2015 as Hugo Gualdron, Robson Cordeiro,
Jose Rodrigues (2015) StructMatrix: Large-scale visualization of graphs by
means of structure detection and dense matrices In: The Fifth IEEE ICDM
Workshop on Data Mining in Networks 1--8, IEE
- …