Search CORE

1,676 research outputs found

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Author: Akkas Abdurrahman
Amarasinghe Saman
Baghdadi Riyadh
Del Sozzo Emanuele
Kamil Shoaib
Ray Jessica
Romdhane Malek Ben
Suriana Patricia
Zhang Yunming
Publication venue
Publication date: 20/12/2018
Field of study

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Manual and Automatic Translation From Sequential to Parallel Programming On Cloud Systems

Author: Li Bing
Publication venue: ScholarWorks @ Georgia State University
Publication date: 30/04/2018
Field of study

Cloud computing has gradually evolved into an infrastructural tool for a variety of scientiﬁc research and computing applications. It has become a trend for many institutions and organizations to migrate their products from local servers to the cloud. One of the current challenges in cloud computing is running software eﬃciently on cloud platforms since many legacy codes cannot be executed in parallel in cloud contexts, which is a waste of the cloud’s computing power. To solve this problem, we have researched ways to translate code from sequential to parallel cloud computing using three categories of translation methods: manual, automatic, and semi-automatic. The performance of manual translation result is better than the other two types of translation’s. However, it is costly to manually redesign and convert current sequential codes into cloud codes. Thus, the automatic translation of sequential codes to parallel cloud applications is one approach that could be taken to resolve the problem of code migration to a cloud infrastructure. During this research, two automatic code translators, Java to MapReduce (J2M) and Java to Spark (J2S), are developed to translate code automatically from sequential Java to MapReduce and Spark applications. A semi-automatic translation method is proposed, which is the combination of manual and automatic translation and performs well on large amounts of data with small fragment sizes. This dissertation provides details about our sequential to parallel cloud code translation research in last four years. The experimental results not only indicate that translators can precisely translate a sequential Java program into parallel cloud applications but also show that it can speed up performance. We expect that an almost linear rate of speedup is possible when processing large datasets. However, some constraints still need to be overcome so more features can be implemented in future work. It is believed that our translators are the ideal models for code migration and will play an important role in the transition era of cloud computing

ScholarWorks @ Georgia State University

Software Support for Irregular and Loosely Synchronous Problems

Author: Choudhary Alok
Fox Geoffrey C.
Hiranandani Seema
Ranka Sanjay
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1992
Field of study

A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D

Syracuse University Research Facility and Collaborative Environment

Software Support for Irregular and Loosely Synchronous Problems

Author: Choudhary Alok
Fox Geoffrey C.
Hiranandani Seema
Ranka Sanja
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1992
Field of study

Syracuse University Research Facility and Collaborative Environment

A Theoretical Approach Involving Recurrence Resolution, Dependence Cycle Statement Ordering and Subroutine Transformation for the Exploitation of Parallelism in Sequential Code.

Author: Chu Chih-ping
Publication venue: LSU Digital Commons
Publication date: 01/01/1991
Field of study

To exploit parallelism in Fortran code, this dissertation consists of a study of the following three issues: (1) recurrence resolution in Do-loops for vector processing, (2) dependence cycle statement ordering in Do-loops for parallel processing, and (3) sub-routine parallelization. For recurrence resolution, the major findings include: (1) the node splitting algorithm cannot be used directly to break an essential antidependence link, of which the source variable that results in antidependence is itself the sink variable of another true dependence so a correction method is proposed, (2) a sink variable renaming technique is capable of breaking an antidependence and/or output-dependence link, (3) for recurrences formed by only true dependences, a dynamic dependence concept and the derived technique are powerful, and (4) by integrating related techniques, an algorithm for resolving a general multistatement recurrence is developed. The performance of a parallel loop is determined by the level of parallelism and the time delay due to interprocessor communication and synchronization. For a dependence cycle of a single parallel loop executed in a general synchronization mode, the parallelism exposed varies with the alignment of statements. Statements are reordered on the basis of execution-time of the loop as estimated at compile-time. An improved timing formula and a derived statement ordering algorithm are proposed. Further extension of this algorithm to multiple perfectly nested Do-loops with simple global dependence cycle is also presented. The subroutine is a potential source for parallel processing. Several problems must be solved for subroutine parallelization: (1) the precedence of parallel executions of subroutines, (2) identification of the optimum execution mode for each subroutine and (3) the restructuring of a serial program. A five-step approach to parallelize called subroutines for a calling subroutine is proposed: (1) computation of control dependence, (2) approximation of the global effects of subroutines, (3) analysis of data dependence, (4) identification of execution mode, and (5) restructuring of calling and called subroutines. Application of these five steps in a recursive manner to different levels of calling subroutines in a program addresses the parallelization of subroutines

Louisiana State University

Recommended from our members

Hierarchical parallelism exploitation

Author: Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

eScholarship - University of California

Advances in Parallel-Stage Decoupled Software Pipelining Leveraging Loop Distribution, Stream-Computing and the SSA Form

Author: Cohen Albert
Li Feng
Pop Antoniu
Publication venue: Florent Bouchez and Sebastian Hack and Eelco Visser
Publication date: 02/04/2011
Field of study

8 pages Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors-Compilers, OptimizationInternational audienceDecoupled Software Pipelining (DSWP) is a program partitioning method enabling compilers to extract pipeline parallelism from sequential programs. Parallel Stage DSWP (PS-DSWP) is an extension that also exploits the data parallelism within pipeline filters. This paper presents the preliminary design of a new PS-DSWP method capable of handling arbitrary structured control flow, a slightly better algorithmic complexity, the natural exploitation of nested parallelism with communications across arbitrary levels, with a seamless integration with data-flow parallel programming environments. It is inspired by loop-distribution and supports nested/structured partitioning along with the hierarchy of control dependences. The method relies on a data-flow streaming extension of OpenMP. These advances are made possible thanks to progresses in compiler intermediate representation. We describe our usage of the Static Single Assignment (SSA) form, how we extend it to the context of concurrent streaming tasks, and we discuss the benefits and challenges for PS-DSWP

INRIA a CCSD electronic archive server

HAL-MINES ParisTech