Search CORE

52,476 research outputs found

A static heap analysis for shape and connectivity: Unified memory analysis: The base framework

Author: Hermenegildo Manuel V.
Kapur Deepak
Marron Mark
Stefanovic Darko
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2007
Field of study

Modeling the evolution of the state of program memory during program execution is critical to many parallehzation techniques. Current memory analysis techniques either provide very accurate information but run prohibitively slowly or produce very conservative results. An approach based on abstract interpretation is presented for analyzing programs at compile time, which can accurately determine many important program properties such as aliasing, logical data structures and shape. These properties are known to be critical for transforming a single threaded program into a versión that can be run on múltiple execution units in parallel. The analysis is shown to be of polynomial complexity in the size of the memory heap. Experimental results for benchmarks in the Jolden suite are given. These results show that in practice the analysis method is efflcient and is capable of accurately determining shape information in programs that créate and manipúlate complex data structures

Archivo Digital UPM

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX

A compiler approach to scalable concurrent program design

Author: Foster Ian
Taylor Stephen
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1992
Field of study

The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support. The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics

CiteSeerX

Caltech Authors

From Loop Transformation to Hardware Generation

Author: BEYLS K
CHRISTIAENS M
Devos Harald
Stroobandt Dirk
Van Campenhout Jan
Publication venue: Veldhoven
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

GPUVerify: A Verifier for GPU Kernels

Author: Adam Betts
Alastair Donaldson
Boyer M.
Cadar C.
Collingbourne P.
Graf S.
Leino K. R. M.
Li G.
Lokhmotov A.
Nathan Chong
Nyland L.
Paul Thomson
Shaz Qadeer
Tripakis S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream ker-nel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational se-mantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invari-ants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, auto-matic verification of a large number of real-world kernels

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Static Application-Level Race Detection in STM Haskell using Contracts

Author: Demeyer Romain
Vanhoof Wim
Publication venue: 'Open Publishing Association'
Publication date: 08/12/2013
Field of study

Writing concurrent programs is a hard task, even when using high-level synchronization primitives such as transactional memories together with a functional language with well-controlled side-effects such as Haskell, because the interferences generated by the processes to each other can occur at different levels and in a very subtle way. The problem occurs when a thread leaves or exposes the shared data in an inconsistent state with respect to the application logic or the real meaning of the data. In this paper, we propose to associate contracts to transactions and we define a program transformation that makes it possible to extend static contract checking in the context of STM Haskell. As a result, we are able to check statically that each transaction of a STM Haskell program handles the shared data in a such way that a given consistency property, expressed in the form of a user-defined boolean function, is preserved. This ensures that bad interference will not occur during the execution of the concurrent program.Comment: In Proceedings PLACES 2013, arXiv:1312.2218. [email protected]; [email protected]

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Repository of the University of Namur

Costing JIT Traces

Author: Maier Patrick
Morton John Magnus
Trinder Philip
Publication venue: School of Computing Science, University of Glasgow
Publication date
Field of study

Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project aims to investigate whether the information in JIT traces can be used to make better scheduling decisions or perform code transformations to adapt the code for a specific parallel architecture. To achieve this goal, a cost model must be developed to estimate the execution time of an individual trace. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We perform a search of the cost model parameter space using genetic algorithms to identify the best weightings for those parameters. We test the accuracy of these cost models for predicting the cost of individual traces on a set of loop-based micro-benchmarks. We also compare the accuracy of the cost models for predicting whole program execution time over the Pycket benchmark suite. Our results show that the weighted cost model using the weightings found from the genetic algorithm search has the best accuracy

Enlighten

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

Author: Akkas Abdurrahman
Amarasinghe Saman
Baghdadi Riyadh
Del Sozzo Emanuele
Kamil Shoaib
Ray Jessica
Romdhane Malek Ben
Suriana Patricia
Zhang Yunming
Publication venue
Publication date: 20/12/2018
Field of study

This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Refining SCJ Mission Specifications into Parallel Handler Designs

Author: Cavalcanti Ana
Zeyda Frank
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2013
Field of study

Safety-Critical Java (SCJ) is a recent technology that restricts the execution and memory model of Java in such a way that applications can be statically analysed and certified for their real-time properties and safe use of memory. Our interest is in the development of comprehensive and sound techniques for the formal specification, refinement, design, and implementation of SCJ programs, using a correct-by-construction approach. As part of this work, we present here an account of laws and patterns that are of general use for the refinement of SCJ mission specifications into designs of parallel handlers used in the SCJ programming paradigm. Our notation is a combination of languages from the Circus family, supporting state-rich reactive models with the addition of class objects and real-time properties. Our work is a first step to elicit laws of programming for SCJ and fits into a refinement strategy that we have developed previously to derive SCJ programs.Comment: In Proceedings Refine 2013, arXiv:1305.563

arXiv.org e-Print Archive

Directory of Open Access Journals