14 research outputs found

    POSE : A mathematical and visual modelling tool to guide energy aware code optimisation

    Get PDF
    Performance engineers are beginning to explore software-level optimisation as a means to reduce the energy consumed when running their codes. This paper presents POSE, a mathematical and visual modelling tool which highlights the relationship between runtime and power consumption. POSE allows developers to assess whether power optimisation is worth pursuing for their codes. We demonstrate POSE by studying the power optimisation characteristics of applications from the Mantevo and Rodinia benchmark suites. We show that LavaMD has the most scope for CPU power optimisation, with improvements in Energy Delay Squared Product (ED2P) of up to 30.59%. Conversely, MiniMD offers the least scope, with improvements to the same metric limited to 7.60%. We also show that no power optimised version of MiniMD operating below 2.3 GHz can match the ED2P performance of the original code running at 3.2 GHz. For LavaMD this limit is marginally less restrictive at 2.2 GHz

    Coupling DDT and Marmot for Debugging of MPI Applications

    Get PDF

    A Semantics for Parallel Programming with BSP

    No full text
    The BSP model is an established practical general-purpose parallel programming model. This paper presents a semantics for the model which provides a foundation for formal development. The parallel-by-merge method is used to express parallel composition; we show that this method can be made to more accurately capture the behaviour of a BSP process if existing constraints to this approach are relaxed; this correction has consequence for many other models of shared-state concurrency. We use the model to establish some simple identities that hold in the BSP model. 1 Introduction This paper presents a predicative semantics for practical high-performance parallel computing in the BSP [15] model. BSP is an increasingly popular approach to the programming of real-world problems on practical parallel machines. The other prominent systems are PVM and MPI. These three approaches cover the vast majority of implementations of parallel applications. The BSP model is unique amongst these other model..

    An Object-Oriented Programming Model for BSP Computations

    No full text
    This project is aimed at addressing the problem of programming for parallel computers. For too long parallel computing has been neglected in developments of programming languages. In spite of great advances in languages for sequential machines in recent decades, brought about by new ideas such as object-oriented programming, parallel programming is often carried out in FORTRAN or C. Existing parallel languages lead to the development of badly written, unmaintainable and inflexible code and this project puts forward object-oriented programming as a solution. I develop a small prototype implementation of a language which is object-oriented and conforms to the BSP model. In this prototype, I have developed methods for the simple declaration of distributed objects on configurable sets of processes and provided automatic generation of functions for synchronisation and communication of objects. In the model proposed, every distributed object is derived from (and located on) a unique set of p..

    Transgressing The Boundaries: Unified Scalable Parallel Programming

    No full text
    The diverse architectural features of parallel computers, and the lack of commonly accepted parallel-programming environments, meant that software development for these systems has been significantly more difficult than the sequential case. Until better approaches are developed, the programming environment will remain a serious obstacle to mainstream scalable parallel computing. The work reported in this paper attempts to integrate architectureindependent scalable parallel programming in the Bulk Synchronous Parallel (BSP) model with the shared-memory parallel programming using the theoretical PRAM model. We start with a discussion of problem parallelism, that is, the parallelism inherent to a problem instead of a specific algorithm, and the parallel-programming techniques that allow the capture of this notion. We then review the ubiquitous PRAM model in terms of the model's pragmatic limitations, where particular attention is paid to simulations on practical machines. The BSP model i..

    Coupling DDT and Marmot for Debugging of MPI Applications

    No full text
    Parallel programming is a complex, and, since the multi-core era has dawned, also a more common task that can be alleviated considerably by tools supporting the application development and porting process. Existing tools, namely the MPI correctness checker Marmot and the parallel debugger DDT, have so far been used on a wide range of platforms as stand-alone tools to cover different aspects of correctness debugging. In this paper we will describe first steps towards coupling these two tools to provide application developers with a powerful and user-friendly environment.

    Bandwidth, Space and Computation Efficient PRAM Programming: The BSP Approach

    No full text
    In this paper we investigate the tractability of PRAM simulations on the IBM SP2 system through the Bulk-Synchronous Parallel model. We present a portable C++ class library that provides PRAM style shared memory facilities for any parallel or distributed system. We also obtain almost optimal speedup for representative PRAM algorithms, thus demonstrating the suitability of our methods for high performance parallel systems

    Bandwidth, Space and Computation Efficient PRAM Programming: The BSP Approach

    No full text
    In this paper we investigate the tractability of PRAM simulations on the IBM SP2 system through the Bulk-Synchronous Parallel model. We present a portable C++ class library that provides PRAM style shared memory facilities for any parallel or distributed system. We also obtain almost optimal speedup for representative PRAM algorithms, thus demonstrating the suitability of our methods for high performance parallel systems. 1 Introduction The Parallel Random Access Machine (PRAM) [1] has been one of the most widely used models of parallel computing. The PRAM is an ideal parallel computer: a potentially unbounded set of processors sharing a global address space. The processors work synchronously and during each time step each processor either performs a computation or accesses a single data-word from the global address space in unit time. The PRAM thus abstracts parallelism by stripping away considerations such as communication latency, memory and network conflicts during routing, bandwi..
    corecore