8 research outputs found

    Mainstream parallel array programming on cell

    Get PDF
    We present the E] compiler and runtime library for the ‘F’ subset of the Fortran 95 programming language. ‘F’ provides first-class support for arrays, allowing E] to implicitly evaluate array expressions in parallel using the SPU coprocessors of the Cell Broadband Engine. We present performance results from four benchmarks that all demonstrate absolute speedups over equivalent ‘C’ or Fortran versions running on the PPU host processor. A significant benefit of this straightforward approach is that a serial implementation of any code is always available, providing code longevity, and a familiar development paradigm

    The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures

    Get PDF
    International audienceThe European FP7 project PEPPHER is addressing programmability and performance portability for current and emerging heterogeneous many-core archi- tectures. As its main idea, the project proposes a multi-level parallel execution model comprised of potentially parallelized components existing in variants suitable for different types of cores, memory configurations, input characteristics, optimization criteria, and couples this with dynamic and static resource and architecture aware scheduling mechanisms. Crucial to PEPPHER is that components can be made performance aware, allowing for more efficient dynamic and static scheduling on the concrete, available resources. The flexibility provided in the software model, combined with a customizable, heterogeneous, memory and topology aware run-time system is key to efficiently exploiting the resources of each concrete hardware configuration. The project takes a holistic approach, relying on existing paradigms, interfaces, and languages for the parallelization of components, and develops a prototype framework, a methodology for extending the framework, and guidelines for constructing performance portable software and systems ­ including paths to migration of existing software ­ for heterogeneous many-core processors. This paper gives a high-level project overview, and presents a specific example showing how the PEPPHER component variant model and resource-aware run-time system enable performance portability of a numerical kernel

    Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

    Get PDF
    International audienceWe discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems. Together, these approaches also support performance portability, as currently investigated in the EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by the integration of the SkePU C++ skeleton programming library with the StarPU runtime system for dynamic scheduling and dynamic selection of suitable execution units for parallel tasks; (2) a language-based approach, here represented by the Offload-C++ high-level language extensions and Offload compiler to generate platform-specific code; and (3) a component-based approach, specifically the PEPPHER component system for annotating user-level application components with performance metadata, thereby preparing them for performance-aware composition. We discuss the strengths and weaknesses of these approaches and show how they could complement each other in an integrational programming framework for heterogeneous multicore systems

    Selection of Task Implementations in the Nanos++ Runtime

    Get PDF
    New heterogeneous systems and hardware accelerators can give higher levels of computational power to high performance computers. However, this does not come for free, since the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource utilization. OmpSs is a task-based programming model and framework focused on the automatic parallelization of sequential applications. We present a set of extensions to this framework: we show how the application programmer can expose different specialized versions of tasks (i.e. pieces of specific code targeted and optimized for a particular architecture) and how the framework will choose between these versions at runtime to obtain the best performance achievable for the given application. From our results, obtained in a multi-GPU system, we can prove that our project gives flexibility to application's source code and can potentially increase application’s performance

    Self-adaptive OmpSs tasks in heterogeneous environments

    Get PDF
    As new heterogeneous systems and hardware accelerators appear, high performance computers can reach a higher level of computational power. Nevertheless, this does not come for free: the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource management. OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications. This paper presents a set of extensions to this framework: we show how the application programmer can expose different specialized versions of tasks (i.e. pieces of specific code targeted and optimized for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application. From the results obtained in a multi-GPU system, we prove that our proposal gives flexibility to application's source code and can potentially increase application's performance.This work has been supported by the European Commission through the ENCORE project (FP7-248647), the TERAFLUX project (FP7-249013), the TEXT project (FP7-261580), the HiPEAC-3 Network of Excellence (FP7-ICT 287759), the Intel-BSC Exascale Lab collaboration project, the support of the Spanish Ministry of Education (CSD2007- 00050 and FPU program), the projects of Computación de Altas Prestaciones V and VI (TIN2007-60625, TIN2012-34557) and the Generalitat de Catalunya (2009-SGR-980).Peer ReviewedPostprint (author’s final draft

    Automatic analysis of DMA races using model checking and k-induction

    No full text
    Modern multicore processors, such as the Cell Broadband Engine, achieve high performance by equipping accelerator cores with small "scratch- pad" memories. The price for increased performance is higher programming complexity - the programmer must manually orchestrate data movement using direct memory access (DMA) operations. Programming using asynchronous DMA operations is error-prone, and DMA races can lead to nondeterministic bugs which are hard to reproduce and fix. We present a method for DMA race analysis in C programs. Our method works by automatically instrumenting a program with assertions modeling the semantics of a memory flow controller. The instrumented program can then be analyzed using state-of-the-art software model checkers. We show that bounded model checking is effective for detecting DMA races in buggy programs. To enable automatic verification of the correctness of instrumented programs, we present a new formulation of k-induction geared towards software, as a proof rule operating on loops. Our techniques are implemented as a tool, Scratch, which we apply to a large set of programs supplied with the IBM Cell SDK, in which we discover a previously unknown bug. Our experimental results indicate that our k-induction method performs extremely well on this problem class. To our knowledge, this marks both the first application of k-induction to software verification, and the first example of software model checking in the context of heterogeneous multicore processors. © Springer Science+Business Media, LLC 2011

    SHAPES : Easy and high-level memory layouts

    Get PDF
    CPU speeds have vastly exceeded those of RAM. As such, developers who aim to achieve high performance on modern architectures will most likely need to consider how to use CPU caches effectively, hence they will need to consider how to place data in memory so as to exploit spatial locality and achieve high memory bandwidth. Performing such manual memory optimisations usually sacrifices readability, maintainability, memory safety, and object abstraction. This is further exacerbated in managed languages, such as Java and C#, where the runtime abstracts away the memory from the developer and such optimisations are, therefore, almost impossible. To that extent, we present in this thesis a language extension called SHAPES . SHAPES aims to offer developers more fine-grained control over the placement of data, without sacrificing memory safety or object abstraction, hence retaining the expressiveness and familiarity of OOP. SHAPES introduces the concepts of pools and layouts; programmers group related objects into pools, and specify how objects are laid out in these pools. Classes and types are annotated by pool parameters, which allow placement aspects to be changed orthogonally to how the business logic operates on the objects in the pool. These design decisions disentangle business logic and memory concerns. We provide a formal model of SHAPES , present its type and memory safety model, and its translation into a low-level language. We present our reasoning as to why we can expect SHAPES to be compiled in an efficient manner in terms of the runtime representation of objects and the access to their fields. Moreover, we present SHAPES -z, an implementation of SHAPES as an embeddable language, and shapeszc , the compiler for SHAPES -z. We provide our our design and implementation considerations for SHAPES -z and shapeszc . Finally, we evaluate the performance of SHAPES and SHAPES -z through case studies.Open Acces
    corecore