Search CORE

131,627 research outputs found

Introducing Molly: Distributed Memory Parallelization with LLVM

Author: Kruse Michael
Publication venue
Publication date: 01/01/2013
Field of study

Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Author: Akkas Abdurrahman
Amarasinghe Saman
Baghdadi Riyadh
Del Sozzo Emanuele
Kamil Shoaib
Ray Jessica
Romdhane Malek Ben
Suriana Patricia
Zhang Yunming
Publication venue
Publication date: 20/12/2018
Field of study

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

A compiler approach to scalable concurrent program design

Author: Foster Ian
Taylor Stephen
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1992
Field of study

The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support. The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics

CiteSeerX

Caltech Authors

System software for the finite element machine

Author: Crockett T. W.
Knott J. D.
Publication venue
Publication date
Field of study

The Finite Element Machine is an experimental parallel computer developed at Langley Research Center to investigate the application of concurrent processing to structural engineering analysis. This report describes system-level software which has been developed to facilitate use of the machine by applications researchers. The overall software design is outlined, and several important parallel processing issues are discussed in detail, including processor management, communication, synchronization, and input/output. Based on experience using the system, the hardware architecture and software design are critiqued, and areas for further work are suggested

NASA Technical Reports Server

Using formal methods to develop WS-BPEL applications

Author: Abouzaid
Alessandro Lapadula
Bauer
Bettini
Bettini
Bocchi
Boreale
Bruni
Bruni
Butler
Carpineti
Cesari
Erich
Fantechi
Ferrari
Ferrari
Francesco Tiezzi
Geguang
Guidi
Guidi
Hallwyl
Laneve
Lapadula
Lapadula
Lapadula
Lohmann
Mayer
Mazzara
Meredith
Montesi
Prandi
Rosario Pugliese
ter Beek
ter Beek
van der Aalst
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

In recent years, WS-BPEL has become a de facto standard language for orchestration of Web Services. However, there are still some well-known difficulties that make programming in WS-BPEL a tricky task. In this paper, we firstly point out major loose points of the WS-BPEL specification by means of many examples, some of which are also exploited to test and compare the behaviour of three of the most known freely available WS-BPEL engines. We show that, as a matter of fact, these engines implement different semantics, which undermines portability of WS-BPEL programs over different platforms. Then we introduce Blite, a prototypical orchestration language equipped with a formal operational semantics, which is closely inspired by, but simpler than, WS-BPEL. Indeed, Blite is designed around some of WS-BPEL distinctive features like partner links, process termination, message correlation, long-running business transactions and compensation handlers. Finally, we present BliteC, a software tool supporting a rapid and easy development of WS-BPEL applications via translation of service orchestrations written in Blite into executable WS-BPEL programs. We illustrate our approach by means of a running example borrowed from the official specification of WS-BPEL

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Camerino

IMT Institutional Repository