Search CORE

647 research outputs found

The Fortran parallel transformer and its programming environment

Author: D'Hollander Erik
WANG Q
ZHANG FB
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs

Author: Baghdadi Riyadh
Cohen Albert
Donaldson Alastair F.
Grosser Tobias
Guelton Serge
Inoue Jun
Kouveli Georgia
Kravets Alexey
Lokhmotov Anton
Nugteren Cedric
Verdoolaege Sven
Waters Fraser
Publication venue
Publication date: 16/11/2012
Field of study

We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Recommended from our members

Percolation-based compiling for evaluation of parallelism and hardware design trade-offs

Author: Potasman Roni
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined architectures. To explore these trade-offs we developed a retargetable compiler based on a set of powerful code transformations called Percolation Scheduling (PS) that map programs with real-time constraints and/or massive time requirements onto synchronous, parallel, high-performance or semi-custom architectures.High-performance is achieved through extraction of application inherent fine-grain parallelism and the use of a suitable architecture. Exploiting fine-grain parallelism is a critical part of exploiting all of the parallelism available in a given program, particularly since highly irregular forms of parallelism are often not visible at coarser levels and since the use of low-level parallelism has a multiplicative effect on the overall performance.To extract substantial parallelism from both the hardware and the compiler, we use a clean, highly parallel VLIW-like architecture that is synchronous, has multiple functional units and has a single program counter. The use of a hazard-free and homogeneous architecture does not result only in a better VLSI design but also considerably increases the compiler's ability to produce better code. To further enhance parallelism we modified the uni-cycle VLIW model and extended the transformations such that pipelined units that provide extra parallelism are used.Another approach presented is of resource constrained scheduling (RCS). Since the RCS problem is known to be NP-hard, in practice it may be solved only by a heuristic approach. We argue that using the heuristic after extraction of the unlimited-resources schedule may yield better results than if the heuristic has been applied at the beginning of the scheduling process.Through a series of benchmarks we evaluate hardware design trade-offs and show that speed-ups on average of one order of magnitude are feasible with sufficient functional units. However, when resources are limited we show that the number of functional units needed may be optimized for a particular suite of application programs

eScholarship - University of California

Parallel machine architecture and compiler design facilities

Author: Kuck David J.
Padua David
Sameh Ahmed
Veidenbaum Alex
Yew Pen-Chung
Publication venue
Publication date
Field of study

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role

NASA Technical Reports Server

Dynamic common sub-expression elimination during scheduling in high-level synthesis

Author: Alex Nicolau
Mehrdad Reshadi
Nick Savoiu
Nikil Dutt
Rajesh Gupta
Sumit Gupta
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Compilation techniques for irregular problems on parallel machines

Author: Das Subhendu
Publication venue: W&M ScholarWorks
Publication date: 01/01/1994
Field of study

Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching

College of William & Mary: W&M Publish

Doctor of Philosophy

Author: Venkat Anand
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

dissertationSparse matrix codes are found in numerous applications ranging from iterative numerical solvers to graph analytics. Achieving high performance on these codes has however been a significant challenge, mainly due to array access indirection, for example, of the form A[B[i]]. Indirect accesses make precise dependence analysis impossible at compile-time, and hence prevent many parallelizing and locality optimizing transformations from being applied. The expert user relies on manually written libraries to tailor the sparse code and data representations best suited to the target architecture from a general sparse matrix representation. However libraries have limited composability, address very specific optimization strategies, and have to be rewritten as new architectures emerge. In this dissertation, we explore the use of the inspector/executor methodology to accomplish the code and data transformations to tailor high performance sparse matrix representations. We devise and embed abstractions for such inspector/executor transformations within a compiler framework so that they can be composed with a rich set of existing polyhedral compiler transformations to derive complex transformation sequences for high performance. We demonstrate the automatic generation of inspector/executor code, which orchestrates code and data transformations to derive high performance representations for the Sparse Matrix Vector Multiply kernel in particular. We also show how the same transformations may be integrated into sparse matrix and graph applications such as Sparse Matrix Matrix Multiply and Stochastic Gradient Descent, respectively. The specific constraints of these applications, such as problem size and dependence structure, necessitate unique sparse matrix representations that can be realized using our transformations. Computations such as Gauss Seidel, with loop carried dependences at the outer most loop necessitate different strategies for high performance. Specifically, we organize the computation into level sets or wavefronts of irregular size, such that iterations of a wavefront may be scheduled in parallel but different wavefronts have to be synchronized. We demonstrate automatic code generation of high performance inspectors that do explicit dependence testing and level set construction at runtime, as well as high performance executors, which are the actual parallelized computations. For the above sparse matrix applications, we automatically generate inspector/executor code comparable in performance to manually tuned libraries

The University of Utah: J. Willard Marriott Digital Library