Search CORE

3,129 research outputs found

Learning process models in IoT Edge

Author: Cheng Long
Duan Yucong
Liu Cong
Liu Qingzhi
Murphy John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/08/2019
Field of study

Crossref

Irish Universities

DCU Online Research Access Service

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Author: Bengio Samy
Chiang Wei-Lin
Hsieh Cho-Jui
Li Yang
Liu Xuanqing
Si Si
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/08/2019
Field of study

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn.Comment: In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19

arXiv.org e-Print Archive

Crossref

A Survey of Parallel Data Mining

Author: Freitas Alex A.
Publication venue
Publication date
Field of study

With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

Kent Academic Repository

Can’t program, won’t program, will program!

Author: Bennett Alex
Peart Amanda
Publication venue: 'The Higher Education Academy'
Publication date: 01/01/2005
Field of study

Portsmouth University Research Portal (Pure)

Efficient algorithms for decision tree cross-validation

Author: Blockeel Hendrik
Struyf Jan
Publication venue
Publication date: 01/01/2001
Field of study

Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.Comment: 9 pages, 6 figures. http://www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=3478

arXiv.org e-Print Archive

Lirias

CiteSeerX