2,578 research outputs found

    Computing graph neural networks: A survey from algorithms to accelerators

    Get PDF
    Graph Neural Networks (GNNs) have exploded onto the machine learning scene in recent years owing to their capability to model and learn from graph-structured data. Such an ability has strong implications in a wide variety of fields whose data are inherently relational, for which conventional neural networks do not perform well. Indeed, as recent reviews can attest, research in the area of GNNs has grown rapidly and has lead to the development of a variety of GNN algorithm variants as well as to the exploration of ground-breaking applications in chemistry, neurology, electronics, or communication networks, among others. At the current stage research, however, the efficient processing of GNNs is still an open challenge for several reasons. Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications. In this context, this article aims to make two main contributions. On the one hand, a review of the field of GNNs is presented from the perspective of computing. This includes a brief tutorial on the GNN fundamentals, an overview of the evolution of the field in the last decade, and a summary of operations carried out in the multiple phases of different GNN algorithm variants. On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.This work is possible thanks to funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 863337 (WiPLASH project) and the Spanish Ministry of Economy and Competitiveness under contract TEC2017-90034-C2-1-R (ALLIANCE project) that receives funding from FEDER.Peer ReviewedPostprint (published version

    Breaking On-device Training Memory Wall: A Systematic Survey

    Full text link
    On-device training has become an increasingly popular approach to machine learning, enabling models to be trained directly on mobile and edge devices. However, a major challenge in this area is the limited memory available on these devices, which can severely restrict the size and complexity of the models that can be trained. In this systematic survey, we aim to explore the current state-of-the-art techniques for breaking on-device training memory walls, focusing on methods that can enable larger and more complex models to be trained on resource-constrained devices. Specifically, we first analyze the key factors that contribute to the phenomenon of memory walls encountered during on-device training. Then, we present a comprehensive literature review of on-device training, which addresses the issue of memory limitations. Finally, we summarize on-device training and highlight the open problems for future research. By providing a comprehensive overview of these techniques and their effectiveness in breaking memory walls, we hope to help researchers and practitioners in this field navigate the rapidly evolving landscape of on-device training.Comment: 8 pages, 3 figure

    Efficient Implementation of Neural Networks for Real-Time Applications

    Get PDF
    Neuronové sítě jsou v současné době jednou z nejpoužívanějších metod ve strojovém učení, která dala vzniknout vědecké disciplíně známé jako hluboké učení. Dosud byly úspěšně nasazeny v mnoha výzkumných odvětvích, jako jsou počítačové vidění, rozpoznávání řeči nebo strojový překlad. Ve většině odvětví je hlavním a někdy jediným měřítkem úspěchu přesnost. Té lze dosáhnou trénováním na velkém množství člověkem označených dat. Nicméně některé aplikace, pracující v reálném čase, jako jsou například autonomní vozidla, vyžadují kromě dobré přesnosti i rychlé a efektivní vnímání. Tato práce poskytuje přehled známých metod pro zlepšení výkonu neuronových sítí se zaměřením na konvoluční. Také zahrnuje několik experimentů, které měří účinnost použití těchto metod na různé architektury neuronových sítí spuštěné na různých platformách, a diskutuje jejich výsledky.Neural networks are currently one of the most common methods in machine learning. They have established a new scientific discipline known as "deep learning" and have been successfully applied in many research fields, such as computer vision, speech recognition, or machine translation. In most fields, the primary and sometimes only concern is good accuracy. It can be achieved by training on large amounts of human-labeled data. However, real-time applications, such as autonomous driving, demand both good accuracy and fast, efficient inference. This thesis provides an overview of known methods of improving neural network performance, with a primary focus on convolutional neural networks. It also presents a series of experiments that measure the efficiency of these methods applied to various neural network architectures and run on different platforms and discusses the results

    Physics and Computing Performance of the Exa.TrkX TrackML Pipeline

    Full text link
    The Exa.TrkX project has applied geometric learning concepts such as metric learning and graph neural networks to HEP particle tracking. The Exa.TrkX tracking pipeline clusters detector measurements to form track candidates and filters them. The pipeline, originally developed using the TrackML dataset (a simulation of an LHC-like tracking detector), has been demonstrated on various detectors, including the DUNE LArTPC and the CMS High-Granularity Calorimeter. This paper documents new developments needed to study the physics and computing performance of the Exa.TrkX pipeline on the full TrackML dataset, a first step towards validating the pipeline using ATLAS and CMS data. The pipeline achieves tracking efficiency and purity similar to production tracking algorithms. Crucially for future HEP applications, the pipeline benefits significantly from GPU acceleration, and its computational requirements scale close to linearly with the number of particles in the event

    Tango: rethinking quantization for graph neural network training on GPUs

    Full text link
    Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented challenges. Current quantized GNN training systems often have longer training times than their full-precision counterparts for two reasons: (i) addressing the accuracy challenge leads to excessive overhead, and (ii) the optimization potential exposed by quantization is not adequately leveraged. This paper introduces Tango which re-thinks quantization challenges and opportunities for graph neural network training on GPUs with three contributions: Firstly, we introduce efficient rules to maintain accuracy during quantized GNN training. Secondly, we design and implement quantization-aware primitives and inter-primitive optimizations that can speed up GNN training. Finally, we integrate Tango with the popular Deep Graph Library (DGL) system and demonstrate its superior performance over state-of-the-art approaches on various GNN models and datasets
    corecore