61,826 research outputs found

    PIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS

    No full text
    6 pagesInternational audienceParallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequential sources. Polyhedral models are becoming more popular because of their combination of expressiveness, compactness, and accurate abstraction of the data-parallel behaviour of programs. These models provide automatic or semi-automatic parallelization and code transformation capabilities that target such modern parallel architectures. PIPS is a quarter-century old source-to-source transformation framework that initially targeted parallel machines but then evolved to include other targets. PIPS uses abstract interpretation on an integer polyhedral lattice to represent program code, allowing linear relation analysis on integer variables in an interprocedural way. The same representation is used for the dependence test and the convex array region analysis. The polyhedral model is also more classically used to schedule code from linear constraints. In this paper, we illustrate the features of this compiler infrastructure on an hypothetical input code, demonstrating the combination of polyhedral and non polyhedral transformations. PIPS interprocedural polyhedral analyses are used to generate data transfers and are combined with non-polyhedral transformations to achieve efficient CUDA code generation

    A compiler approach to scalable concurrent program design

    Get PDF
    The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support. The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics

    Needed Computations Shortcutting Needed Steps

    Get PDF
    We define a compilation scheme for a constructor-based, strongly-sequential, graph rewriting system which shortcuts some needed steps. The object code is another constructor-based graph rewriting system. This system is normalizing for the original system when using an innermost strategy. Consequently, the object code can be easily implemented by eager functions in a variety of programming languages. We modify this object code in a way that avoids total or partial construction of the contracta of some needed steps of a computation. When computing normal forms in this way, both memory consumption and execution time are reduced compared to ordinary rewriting computations in the original system.Comment: In Proceedings TERMGRAPH 2014, arXiv:1505.0681

    Abstract State Machines 1988-1998: Commented ASM Bibliography

    Get PDF
    An annotated bibliography of papers which deal with or use Abstract State Machines (ASMs), as of January 1998.Comment: Also maintained as a BibTeX file at http://www.eecs.umich.edu/gasm

    Realising the open virtual commissioning of modular automation systems

    Get PDF
    To address the challenges in the automotive industry posed by the need to rapidly manufacture more product variants, and the resultant need for more adaptable production systems, radical changes are now required in the way in which such systems are developed and implemented. In this context, two enabling approaches for achieving more agile manufacturing, namely modular automation systems and virtual commissioning, are briefly reviewed in this contribution. Ongoing research conducted at Loughborough University which aims to provide a modular approach to automation systems design coupled with a virtual engineering toolset for the (re)configuration of such manufacturing automation systems is reported. The problems faced in the virtual commissioning of modular automation systems are outlined. AutomationML - an emerging neutral data format which has potential to address integration problems is discussed. The paper proposes and illustrates a collaborative framework in which AutomationML is adopted for the data exchange and data representation of related models to enable efficient open virtual prototype construction and virtual commissioning of modular automation systems. A case study is provided to show how to create the data model based on AutomationML for describing a modular automation system

    A Compiler and Runtime Infrastructure for Automatic Program Distribution

    Get PDF
    This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

    Concurrent Model Transformations with Linda

    Get PDF
    Nowadays, model transformations languages and engines use a sequential execution model. This is, only one execution thread deals with the whole transformation. However, model transformations dealing with very large models, such as those used in biology or aerospace applications, require concurrent solutions in order to speed up their performance. In this ongoing work we explore the use of Linda for implementing a set of basic mechanisms to enable concurrent model transformations, and present our initial results.Proyectos TIN2011-23795, TIN2011-15497-E y Andalucía Tech Campus de Excelencia

    Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

    Get PDF
    Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.Peer ReviewedPostprint (author's final draft
    corecore