1,793 research outputs found
Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems of interacting
objects, unifying the study of diverse phenomena including biological organisms
and human society. One crucial step when studying the structure and dynamics of
networks is to identify communities: groups of related nodes that correspond to
functional subunits such as protein complexes or social spheres. Communities in
networks often overlap such that nodes simultaneously belong to several groups.
Meanwhile, many networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical structure. However, the
fact that many real networks have communities with pervasive overlap, where
each and every node belongs to more than one group, has the consequence that a
global hierarchy of nodes cannot capture the relationships between overlapping
groups. Here we reinvent communities as groups of links rather than nodes and
show that this unorthodox approach successfully reconciles the antagonistic
organizing principles of overlapping communities and hierarchy. In contrast to
the existing literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hierarchical
organization. We find relevant link communities in many networks, including
major biological networks such as protein-protein interaction and metabolic
networks, and show that a large social network contains hierarchically
organized community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link communities are
fundamental building blocks that reveal overlap and hierarchical organization
in networks to be two aspects of the same phenomenon.Comment: Main text and supplementary informatio
Learning Edge Representations via Low-Rank Asymmetric Projections
We propose a new method for embedding graphs while preserving directed edge
information. Learning such continuous-space vector representations (or
embeddings) of nodes in a graph is an important first step for using network
information (from social networks, user-item graphs, knowledge bases, etc.) in
many machine learning tasks.
Unlike previous work, we (1) explicitly model an edge as a function of node
embeddings, and we (2) propose a novel objective, the "graph likelihood", which
contrasts information from sampled random walks with non-existent edges.
Individually, both of these contributions improve the learned representations,
especially when there are memory constraints on the total size of the
embeddings. When combined, our contributions enable us to significantly improve
the state-of-the-art by learning more concise representations that better
preserve the graph structure.
We evaluate our method on a variety of link-prediction task including social
networks, collaboration networks, and protein interactions, showing that our
proposed method learn representations with error reductions of up to 76% and
55%, on directed and undirected graphs. In addition, we show that the
representations learned by our method are quite space efficient, producing
embeddings which have higher structure-preserving accuracy but are 10 times
smaller
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Graph convolutional network (GCN) has been successfully applied to many
graph-based applications; however, training a large-scale GCN remains
challenging. Current SGD-based algorithms suffer from either a high
computational cost that exponentially grows with number of GCN layers, or a
large space requirement for keeping the entire graph and the embedding of each
node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm
that is suitable for SGD-based training by exploiting the graph clustering
structure. Cluster-GCN works as the following: at each step, it samples a block
of nodes that associate with a dense subgraph identified by a graph clustering
algorithm, and restricts the neighborhood search within this subgraph. This
simple but effective strategy leads to significantly improved memory and
computational efficiency while being able to achieve comparable test accuracy
with previous algorithms. To test the scalability of our algorithm, we create a
new Amazon2M data with 2 million nodes and 61 million edges which is more than
5 times larger than the previous largest publicly available dataset (Reddit).
For training a 3-layer GCN on this data, Cluster-GCN is faster than the
previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much
less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this
data, our algorithm can finish in around 36 minutes while all the existing GCN
training algorithms fail to train due to the out-of-memory issue. Furthermore,
Cluster-GCN allows us to train much deeper GCN without much time and memory
overhead, which leads to improved prediction accuracy---using a 5-layer
Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI
dataset, while the previous best result was 98.71 by [16]. Our codes are
publicly available at
https://github.com/google-research/google-research/tree/master/cluster_gcn.Comment: In Proceedings of the 25th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD'19
Deep learning for extracting protein-protein interactions from biomedical literature
State-of-the-art methods for protein-protein interaction (PPI) extraction are
primarily feature-based or kernel-based by leveraging lexical and syntactic
information. But how to incorporate such knowledge in the recent deep learning
methods remains an open question. In this paper, we propose a multichannel
dependency-based convolutional neural network model (McDepCNN). It applies one
channel to the embedding vector of each word in the sentence, and another
channel to the embedding vector of the head of the corresponding word.
Therefore, the model can use richer information obtained from different
channels. Experiments on two public benchmarking datasets, AIMed and BioInfer,
demonstrate that McDepCNN compares favorably to the state-of-the-art
rich-feature and single-kernel based methods. In addition, McDepCNN achieves
24.4% relative improvement in F1-score over the state-of-the-art methods on
cross-corpus evaluation and 12% improvement in F1-score over kernel-based
methods on "difficult" instances. These results suggest that McDepCNN
generalizes more easily over different corpora, and is capable of capturing
long distance features in the sentences.Comment: Accepted for publication in Proceedings of the 2017 Workshop on
Biomedical Natural Language Processing, 10 pages, 2 figures, 6 table
- …