Search CORE

62 research outputs found

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Author: Akkas Abdurrahman
Amarasinghe Saman
Baghdadi Riyadh
Del Sozzo Emanuele
Kamil Shoaib
Ray Jessica
Romdhane Malek Ben
Suriana Patricia
Zhang Yunming
Publication venue
Publication date: 20/12/2018
Field of study

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

An Exploration of FPGAs as Accelerators for Graph Analysis via High-Level Synthesis

Author: Pedro Filipe Vilhena de Campos Oliveira e Silva
Publication venue
Publication date: 14/01/2021
Field of study

Repositório Aberto da Universidade do Porto

Taming the Expressiveness and Programmability of Graph Analytical Queries.

Author: Han Y
Hao K
Lai L
Lin X
Qian Z
Qin L
Zhao Y
Zhou J
Zhou Z
Publication venue
Publication date: 31/01/2022
Field of study

OPUS - University of Technology Sydney

Recommended from our members

Compiler and system for resilient distributed heterogeneous graph analytics

Author: Gill Gurbinder Singh
Publication venue
Publication date: 11/01/2021
Field of study

Graph analytics systems are used in a wide variety of applications including health care, electronic circuit design, machine learning, and cybersecurity. Graph analytics systems must handle very large graphs such as the Facebook friends graph, which has more than a billion nodes and 200 billion edges. Since machines have limited main memory, distributed-memory clusters with sufficient memory and computation power are required for processing of these graphs. In distributed graph analytics, the graph is partitioned among the machines in a cluster, and communication between partitions is implemented using a substrate like MPI. However, programming distributed-memory systems are not easy and the recent trend towards the processor heterogeneity has added to this complexity. To simplify the programming of graph applications on such platforms, this dissertation first presents a compiler called Abelian that translates shared-memory descriptions of graph algorithms written in the Galois programming model into efficient code for distributed-memory platforms with heterogeneous processors. An important runtime parameter to the compiler-generated distributed code is the partitioning policy. We present an experimental study of partitioning strategies for distributed work-efficient graph analytics applications on different CPU architecture clusters at large scale (up to 256 machines). Based on the study we present a simple rule of thumb to select among myriad policies. Another challenge of distributed graph analytics that we address in this dissertation is to deal with machine fail-stop failures, which is an important concern especially for long-running graph analytics applications on large clusters. We present a novel communication and synchronization substrate called Phoenix that leverages the algorithmic properties of graph analytics applications to recover from faults with zero overheads during fault-free execution and show that Phoenix is 24x faster than previous state-of-the-art systems. In this dissertation, we also look at the new opportunities for graph analytics on massive datasets brought by a new kind of byte-addressable memory technology with higher density and lower cost than DRAM such as intel Optane DC Persistent Memory. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this dissertation, we present key runtime and algorithmic principles to consider when performing graph analytics on massive datasets on Optane DC Persistent Memory as well as highlight ideas that apply to graph analytics on all large-memory platforms. Finally, we show that our distributed graph analytics infrastructure can be used for a new domain of applications, in particular, embedding algorithms such as Word2Vec. Word2Vec trains the vector representations of words (also known as word embeddings) on large text corpus and resulting vector embeddings have been shown to capture semantic and syntactic relationships among words. Other examples include Node2Vec, Code2Vec, Sequence2Vec, etc (collectively known as Any2Vec) with a wide variety of uses. We formulate the training of such applications as a graph problem and present GraphAny2Vec, a distributed Any2Vec training framework that leverages the state-of-the-art distributed heterogeneous graph analytics infrastructure developed in this dissertation to scale Any2Vec training to large distributed clusters. GraphAny2Vec also demonstrates a novel way of combining model gradients during training, which allows it to scale without losing accuracyComputer Science

Texas ScholarWorks

Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design

Author: Ahmadi Agreen
Austin Todd
Behroozi Armand
Dreslinski Ronald
Kaszyk Kuba
Li Lu
Mahlke Scott
May Kyle
Morton John Magnus
Mudge Trevor
Nguyen Brandon
O'Boyle Michael F P
Sun Jiawen
Talati Nishil
Vasiladiotis Christos
Verma Tarunesh
Yang Yichen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2021
Field of study

Edinburgh Research Explorer

Exploring Multi-Level Parallelism For Graph-Based Applications Via Algorithm And System Co-Design

Author: Peng Zhen
Publication venue: W&M ScholarWorks
Publication date: 01/01/2022
Field of study

Graph processing is at the heart of many modern applications where graphs are used as the basic data structure to represent the entities of interest and the relationships between them. Improving the performance of graph-based applications, especially using parallelism techniques, has drawn significant interest both in academia and industry. On the one hand, modern CPU architectures are able to provide massive computational power by using sophisticated memory hierarchy and multi-level parallelism, including thread-level parallelism, data-level parallelism, etc. On the other hand, graph processing workloads are notoriously challenging for achieving high performance due to their irregular computation pattern and unpredictable control flow. Therefore, how to accelerate the performance of graph-based applications using parallelism is still an open question. This dissertation focuses on providing high performance for graph-based applications. To take full advantage of multi-level parallelism resources provided by CPUs, this dissertation studies the characteristics of graph-based applications and matches their parallel solutions with the underlying hardware via algorithm and system co-design. This dissertation divides graph-based applications into three categories: typical graph algorithms, sequential graph-based applications, and applications with graph-based solutions. The first category comprises typical graph algorithms with available parallel solutions. This dissertation proposes GraphPhi as a new approach to graph processing on emerging Intel Xeon Phi-like architectures. The second category includes specialized graph applications without nontrivial parallel solutions. This dissertation studies a state-of-the-art 2-hop labeling approach named Pruned Landmark Labeling (PLL). This dissertation proposes Batched Vertex-Centric PLL (BVC-PLL), which breaks PLL\u27s inherent dependencies and parallelizes it in a scalable way. The third category includes applications that rely on graph-based solutions. This dissertation studies the sequential search algorithm for the graph-based indexing methods used for the Approximate Nearest Neighbor Search (ANNS) problem. This dissertation proposes Speed-ANN, a parallel similarity search algorithm that reveals hidden intra-query parallelism to accelerate the search speed while fulfilling the high accuracy requirement. Moreover, this dissertation further explores the optimization opportunities for computational graph-based deep neural network inference running on tiny devices, specifically microcontrollers (MCUs). Altogether, this dissertation studies graph-based applications and improves their performance by providing solutions of multi-level parallelism via algorithm and system co-design to match them with the underlying multi-core CPU architectures

College of William & Mary: W&M Publish