2 research outputs found
Deep Graph Library Optimizations for Intel(R) x86 Architecture
The Deep Graph Library (DGL) was designed as a tool to enable structure
learning from graphs, by supporting a core abstraction for graphs, including
the popular Graph Neural Networks (GNN). DGL contains implementations of all
core graph operations for both the CPU and GPU. In this paper, we focus
specifically on CPU implementations and present performance analysis,
optimizations and results across a set of GNN applications using the latest
version of DGL(0.4.3). Across 7 applications, we achieve speed-ups ranging
from1 1.5x-13x over the baseline CPU implementations
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
During the past decade, novel Deep Learning (DL) algorithms, workloads and
hardware have been developed to tackle a wide range of problems. Despite the
advances in workload and hardware ecosystems, the programming methodology of DL
systems is stagnant. DL workloads leverage either highly-optimized, yet
platform-specific and inflexible kernels from DL libraries, or in the case of
novel operators, reference implementations are built via DL framework
primitives with underwhelming performance. This work introduces the Tensor
Processing Primitives (TPP), a programming abstraction striving for efficient,
portable implementation of DL workloads with high-productivity. TPPs define a
compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA),
which subsequently can be utilized as building-blocks to construct complex
operators on high-dimensional tensors. The TPP specification is
platform-agnostic, thus code expressed via TPPs is portable, whereas the TPP
implementation is highly-optimized and platform-specific. We demonstrate the
efficacy and viability of our approach using standalone kernels and end-to-end
DL & HPC workloads expressed entirely via TPPs that outperform state-of-the-art
implementations on multiple platforms