Search CORE

61,826 research outputs found

PIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS

Author: Amini Mehdi
Ancourt Corinne
Coelho Fabien
Creusillet Béatrice
Guelton Serge
Irigoin François
Jouvelot Pierre
Keryell Ronan
Villalon Pierre
Publication venue: HAL CCSD
Publication date: 03/04/2011
Field of study

6 pagesInternational audienceParallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequential sources. Polyhedral models are becoming more popular because of their combination of expressiveness, compactness, and accurate abstraction of the data-parallel behaviour of programs. These models provide automatic or semi-automatic parallelization and code transformation capabilities that target such modern parallel architectures. PIPS is a quarter-century old source-to-source transformation framework that initially targeted parallel machines but then evolved to include other targets. PIPS uses abstract interpretation on an integer polyhedral lattice to represent program code, allowing linear relation analysis on integer variables in an interprocedural way. The same representation is used for the dependence test and the convex array region analysis. The polyhedral model is also more classically used to schedule code from linear constraints. In this paper, we illustrate the features of this compiler infrastructure on an hypothetical input code, demonstrating the combination of polyhedral and non polyhedral transformations. PIPS interprocedural polyhedral analyses are used to generate data transfers and are combined with non-polyhedral transformations to achieve efficient CUDA code generation

HAL-MINES ParisTech

A compiler approach to scalable concurrent program design

Author: Foster Ian
Taylor Stephen
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1992
Field of study

The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support. The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics

CiteSeerX

Caltech Authors

Recommended from our members

Percolation scheduling for non-VLIW machines

Author: Brownhill Carrie J.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 15/01/1990
Field of study

Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps

eScholarship - University of California

Needed Computations Shortcutting Needed Steps

Author: Antoy Sergio
Johannsen Jacob
Libby Steven
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2015
Field of study

We define a compilation scheme for a constructor-based, strongly-sequential, graph rewriting system which shortcuts some needed steps. The object code is another constructor-based graph rewriting system. This system is normalizing for the original system when using an innermost strategy. Consequently, the object code can be easily implemented by eager functions in a variety of programming languages. We modify this object code in a way that avoids total or partial construction of the contracta of some needed steps of a computation. When computing normal forms in this way, both memory consumption and execution time are reduced compared to ordinary rewriting computations in the original system.Comment: In Proceedings TERMGRAPH 2014, arXiv:1505.0681

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PDXScholar (Portland State University)

Abstract State Machines 1988-1998: Commented ASM Bibliography

Author: Boerger Egon
Huggins James K.
Publication venue
Publication date: 01/01/1998
Field of study

An annotated bibliography of papers which deal with or use Abstract State Machines (ASMs), as of January 1998.Comment: Also maintained as a BibTeX file at http://www.eecs.umich.edu/gasm

arXiv.org e-Print Archive

CiteSeerX

Kettering University

Realising the open virtual commissioning of modular automation systems

Author: Ahmad Bilal
Harrison Robert
Jain Atul
Kong Xiangyu
Lee Leslie J.
Park Y.
Publication venue
Publication date: 01/01/2011
Field of study

To address the challenges in the automotive industry posed by the need to rapidly manufacture more product variants, and the resultant need for more adaptable production systems, radical changes are now required in the way in which such systems are developed and implemented. In this context, two enabling approaches for achieving more agile manufacturing, namely modular automation systems and virtual commissioning, are briefly reviewed in this contribution. Ongoing research conducted at Loughborough University which aims to provide a modular approach to automation systems design coupled with a virtual engineering toolset for the (re)configuration of such manufacturing automation systems is reported. The problems faced in the virtual commissioning of modular automation systems are outlined. AutomationML - an emerging neutral data format which has potential to address integration problems is discussed. The paper proposes and illustrates a collaborative framework in which AutomationML is adopted for the data exchange and data representation of related models to enable efficient open virtual prototype construction and virtual commissioning of modular automation systems. A case study is provided to show how to create the data model based on AutomationML for describing a modular automation system

Loughborough University Institutional Repository

Warwick Research Archives Portal Repository

A Compiler and Runtime Infrastructure for Automatic Program Distribution

Author: Chu Matt
Diaconescu Roxana E.
Mouri Zachary
Wang Lei
Publication venue
Publication date: 01/04/2005
Field of study

This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

Caltech Authors

Concurrent Model Transformations with Linda

Author: Burgueño Loli
Troya Javier
Vallecillo-Moreno Antonio Jesus
Publication venue
Publication date: 01/01/2013
Field of study

Nowadays, model transformations languages and engines use a sequential execution model. This is, only one execution thread deals with the whole transformation. However, model transformations dealing with very large models, such as those used in biology or aerospace applications, require concurrent solutions in order to speed up their performance. In this ongoing work we explore the use of Linda for implementing a set of basic mechanisms to enable concurrent model transformations, and present our initial results.Proyectos TIN2011-23795, TIN2011-15497-E y Andalucía Tech Campus de Excelencia

Repositorio Institucional Universidad de Málaga

idUS. Depósito de Investigación Universidad de Sevilla

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

Author: Alvanos Michail
Amaral José Nelson
Farreras Esclusa Montserrat
Martorell Bofill Xavier
Tiotto Ettore
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC