Search CORE

32,102 research outputs found

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Author: Bengio Samy
Chiang Wei-Lin
Hsieh Cho-Jui
Li Yang
Liu Xuanqing
Si Si
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/08/2019
Field of study

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn.Comment: In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19

arXiv.org e-Print Archive

Fast Deterministic Selection

Author: Alexandrescu Andrei
Publication venue
Publication date: 04/08/2016
Field of study

The Median of Medians (also known as BFPRT) algorithm, although a landmark theoretical achievement, is seldom used in practice because it and its variants are slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm QuickselectAdaptive based on a refined definition of MedianOfMedians. The algorithm's performance brings deterministic selection---along with its desirable properties of reproducible runs, predictable run times, and immunity to pathological inputs---in the range of practicality. We demonstrate results on independent and identically distributed random inputs and on normally-distributed inputs. Measurements show that QuickselectAdaptive is faster than state-of-the-art baselines.Comment: Pre-publication draf

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the Pitman\u2013Yor process with spike and slab base measure

Author: Canale Antonio
Lijoi Antonio
Nipoti Bernardo
Pr\ufcnster Igor
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Archivio istituzionale della ricerca - Università di Padova

Comparing Community Structure to Characteristics in Online Collegiate Social Networks

Author: A. Porter
Amanda L. Traud
Eric D. Kelsic
Mason
Peter J. Mucha
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2010
Field of study

We study the structure of social networks of students by examining the graphs of Facebook "friendships" at five American universities at a single point in time. We investigate each single-institution network's community structure and employ graphical and quantitative tools, including standardized pair-counting methods, to measure the correlations between the network communities and a set of self-identified user characteristics (residence, class year, major, and high school). We review the basic properties and statistics of the pair-counting indices employed and recall, in simplified notation, a useful analytical formula for the z-score of the Rand coefficient. Our study illustrates how to examine different instances of social networks constructed in similar environments, emphasizes the array of social forces that combine to form "communities," and leads to comparative observations about online social lives that can be used to infer comparisons about offline social structures. In our illustration of this methodology, we calculate the relative contributions of different characteristics to the community structure of individual universities and subsequently compare these relative contributions at different universities, measuring for example the importance of common high school affiliation to large state universities and the varying degrees of influence common major can have on the social structure at different universities. The heterogeneity of communities that we observe indicates that these networks typically have multiple organizing factors rather than a single dominant one.Comment: Version 3 (17 pages, 5 multi-part figures), accepted in SIAM Revie

arXiv.org e-Print Archive

CiteSeerX

Carolina Digital Repository

Oxford University Research Archive

Recommended from our members

Choosing a testing method to deliver reliability

Author: Frankl P. G.
Hamlet D.
Littlewood B.
Strigini L.
Publication venue
Publication date
Field of study

City Research Online