Search CORE

326 research outputs found

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Author: Akkas Abdurrahman
Amarasinghe Saman
Baghdadi Riyadh
Del Sozzo Emanuele
Kamil Shoaib
Ray Jessica
Romdhane Malek Ben
Suriana Patricia
Zhang Yunming
Publication venue
Publication date: 20/12/2018
Field of study

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Recommended from our members

Languages and Compilers for Writing Efficient High-Performance Computing Applications

Author: Jangda Abhinav
Publication venue: ScholarWorks@UMass Amherst
Publication date: 26/10/2022
Field of study

Many everyday applications, such as web search, speech recognition, and weather prediction, are executed on high-performance systems containing thousands of Central Processing Units (CPUs) and Graphics Processing Units (GPUs). These applications can be written in either low-level programming languages, such as NVIDIA CUDA, or domain specific languages, like Halide for image processing and PyTorch for machine learning programs. Despite the popularity of these languages, there are several challenges that programmers face when developing efficient high-performance computing applications. First, since every hardware support a different low-level programming model, to utilize new hardware programmers need to rewrite their applications in another programming language. Second, writing efficient code involves restructuring the computation to ensure (i) regular memory access patterns, (ii) non-divergent control flow, and (iii) complete utilization of different programmer managed caches. Furthermore, since these low-level optimizations are known only to hardware experts, it is difficult for a domain expert to write optimized code for new computations. Third, existing domain specific languages suffer from optimization barriers in the language constructs that prevent new optimizations and hence, these languages provide sub-optimal performance. To address these challenges this thesis presents the following novel abstractions and compiler techniques for writing image processing and machine learning applications that can run efficiently on a variety of high-performance systems. First, this thesis presents techniques to optimize image processing programs on GPUs using the features of modern GPUs. These techniques improve the concurrency and register usage of generated code to provide better performance than the state-of-the-art. Second, this thesis presents NextDoor, which is the first system to provide an abstraction for writing graph sampling applications and efficiently executing these applications on GPUs. Third, this thesis presents CoCoNet, which is a domain specific language to co-optimize communication and computation in distributed machine learning workloads. By breaking the optimization barriers in existing domain specific languages, these techniques help programmers write correct and efficient code for diverse high-performance computing workloads

ScholarWorks@UMass Amherst

Automatic scheduling of image processing pipelines

Author: Sioutas Savvas
Publication venue: Technische Universiteit Eindhoven
Publication date: 18/12/2020
Field of study

Pure OAI Repository

Low-Code/No-Code Artificial Intelligence Platforms for the Health Informatics Domain

Author: Brandon Colm
Margaria Tiziana
Publication venue: European Association of Software Science and Technology
Publication date: 06/10/2023
Field of study

In the contemporary health informatics space, Artificial Intelligence (AI) has become a necessity for the extraction of actionable knowledge in a timely manner. Low-code/No-Code (LCNC) AI Platforms enable domain experts to leverage the value that AI has to offer by lowering the technical skills overhead. We develop domain-specific, service-orientated platforms in the context of two subdomains of health informatics. We address in this work the core principles and the architectures of these platforms whose functionality we are constantly extending. Our work conforms to best practices with respect to the integration and interoperability of external services and provides process orchestration in a LCNC modeldriven fashion. We chose the CINCO product DIME and a bespoke tool developed in CINCO Cloud to serve as the underlying infrastructure for our LCNC platforms which address the requirements from our two application domains; public health and biomedical research. In the context of public health, an environment for building AI driven web applications for the automated evaluation of Web-based Health Information (WBHI). With respect to biomedical research, an AI driven workflow environment for the computational analysis of highly-plexed tissue images. We extended both underlying application stacks to support the various AI service functionality needed to address the requirements of the two application domains. The two case studies presented outline the methodology of developing these platforms through co-design with experts in the respective domains. Moving forward we anticipate we will increasingly re-use components which will reduce the development overhead for extending our existing platforms or developing new applications in similar domains

Electronic Communications of the EASST (European Association of Software Science and Technology)

Exploring the Suitability of the Cerebras Wafer Scale Engine for Stencil-Based Computation Codes

Author: Brown Nick
Echols Brandon
Grosser Tobias
Zarins Justs
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/05/2023
Field of study

Edinburgh Research Explorer

TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine

Author: Brown Nick
Echols Brandon
Grosser Tobias
Zarins Justs
Publication venue
Publication date: 26/08/2022
Field of study

The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it is also a very interesting potential target for accelerating traditional HPC computational codes. Many of these algorithms are stencil-based, where update operations involve contributions from neighbouring elements, and in this paper we explore the suitability of this technology for such codes from the perspective of an early adopter of the technology, compared to CPUs and GPUs. Using TensorFlow as the interface, we explore the performance and demonstrate that, whilst there is still work to be done around exposing the programming interface to users, performance of the WSE is impressive as it out performs four V100 GPUs by two and a half times and two Intel Xeon Platinum CPUs by around 114 times in our experiments. There is significant potential therefore for this technology to play an important role in accelerating HPC codes on future exascale supercomputers.Comment: This preprint has not undergone any post-submission improvements or corrections. Preprint of paper submitted to Euro-Par DSL-HPC worksho

arXiv.org e-Print Archive

Acceleration of Applications with FPGA-Based Computing Machines: New DSL

Author: Daniel Alexandre Pimenta Lopes Fernandes
Publication venue
Publication date: 12/07/2019
Field of study

Repositório Aberto da Universidade do Porto