636 research outputs found

    The AXIOM software layers

    Get PDF
    AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft

    Response-time analysis of DAG tasks supporting heterogeneous computing

    Get PDF
    Hardware platforms are evolving towards parallel and heterogeneous architectures to overcome the increasing necessity of more performance in the real-time domain. Parallel programming models are fundamental to exploit the performance capabilities of these architectures. This paper proposes a novel response time analysis (RTA) for verifying the schedulability of DAG tasks supporting heterogeneous computing. It analyzes the impact of executing part of the DAG in the accelerator device. As a result, the response time upper bound of the system is more precise than the one provided by currently existing RTA targeting homogeneous architectures.This work is supported by the Spanish Ministry of Science and Innovation under contract TIN2015-65316-PPeer ReviewedPostprint (published version

    Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

    Get PDF
    In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 × for real benchmarks, which is 60% higher than what we observe with the original Kalray OpenMP implementation

    Safe Parallelism: Compiler Analysis Techniques for Ada and OpenMP

    Get PDF
    There is a growing need to support parallel computation in Ada to cope with the performance requirements of the most advanced functionalities of safety-critical systems. In that regard, the use of parallel programming models is paramount to exploit the benefits of parallelism. Recent works motivate the use of OpenMP for being a de facto standard in high-performance computing for programming shared memory architectures. These works address two important aspects towards the introduction of OpenMP in Ada: the compatibility of the OpenMP syntax with the Ada language, and the interoperability of the OpenMP and the Ada runtimes, demonstrating that OpenMP complements and supports the structured parallelism approach of the tasklet model. This paper addresses a third fundamental aspect: functional safety from a compiler perspective. Particularly, it focuses on race conditions and considers the fine-grain and unstructured capabilities of OpenMP. Hereof, this paper presents a new compiler analysis technique that: (1) identifies potential race conditions in parallel Ada programs based on OpenMP or Ada tasks or both, and (2) provides solutions for the detected races.This work was supported by the Spanish Ministry of Science and Innovation under contract TIN2015-65316-P, and by the FCT (Portuguese Foundation for Science and Technology) within the CISTER Research Unit (CEC/04234).Peer ReviewedPostprint (author's final draft

    Mixed-data-model heterogeneous compilation and OpenMP offloading

    Get PDF
    Heterogeneous computers combine a general-purpose host processor with domain-specific programmable many-core accelerators, uniting high versatility with high performance and energy efficiency. While the host manages ever-more application memory, accelerators are designed to work mainly on their local memory. This difference in addressed memory leads to a discrepancy between the optimal address width of the host and the accelerator. Today 64-bit host processors are commonplace, but few accelerators exceed 32-bit addressable local memory, a difference expected to increase with 128-bit hosts in the exascale era. Managing this discrepancy requires support for multiple data models in heterogeneous compilers. So far, compiler support for multiple data models has not been explored, which hampers the programmability of such systems and inhibits their adoption. In this work, we perform the first exploration of the feasibility and performance of implementing a mixed-data-mode heterogeneous system. To support this, we present and evaluate the first mixed-data-model compiler, supporting arbitrary address widths on host and accelerator. To hide the inherent complexity and to enable high programmer productivity, we implement transparent offloading on top of OpenMP. The proposed compiler techniques are implemented in LLVM and evaluated on a 64+32-bit heterogeneous SoC. Results on benchmarks from the PolyBench-ACC suite show that memory can be transparently shared between host and accelerator at overheads below 0.7 % compared to 32-bit-only execution, enabling mixed-data-model computers to execute at near-native performance

    Refactoring software to heterogeneous parallel platforms

    Get PDF
    In summary, the papers included in this special issue are representative of the progress achieved by the research community at various levels from the very high level using parallel patterns to lower levels using, for example, transactional software memory. Also the integration of GPUs and FPGAs in the landscape is essential to achieve better performance in different categories of applications. All these innovative research directions will contribute to better achieve the long-term goal of better refactoring of existing applications to new and evolving parallel heterogeneous architectures
    corecore