124 research outputs found

    A design methodology for compositional high-level synthesis of communication-centric SoCs

    Get PDF
    Systems-on-chip are increasingly designed at the system level by combining synthesizable IP components that operate concurrently while interacting through communication channels. CAD-tool vendors support this System-Level Design approach with high-level synthesis tools and libraries of interface primitives implementing the communication protocols. These interfaces absorb timing differences in the hardware-component implementations, thus enabling compositional design. However, they introduce also new challenges in terms of functional correctness and performance optimization. We propose a methodology that combines performance analysis and optimization algorithms to automatically address the issues that SoC designers may accidentally introduce when assembling components that are specified at the system level. Copyright 2014 ACM

    A design methodology for compositional high-level synthesis of communication-centric SoCs.

    Get PDF
    ABSTRACT Systems-on-chip are increasingly designed at the system level by combining synthesizable IP components that operate concurrently while interacting through communication channels. CAD-tool vendors support this System-Level Design approach with high-level synthesis tools and libraries of interface primitives implementing the communication protocols. These interfaces absorb timing differences in the hardware-component implementations, thus enabling compositional design. However, they introduce also new challenges in terms of functional correctness and performance optimization. We propose a methodology that combines performance analysis and optimization algorithms to automatically address the issues that SoC designers may accidentally introduce when assembling components that are specified at the system level

    System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip

    Get PDF
    In modern system-on-chip architectures, specialized accelerators are increasingly used to improve performance and energy efficiency. The growing complexity of these systems requires the use of system-level design methodologies featuring high-level synthesis (HLS) for generating these components efficiently. Existing HLS tools, however, have limited support for the system-level optimization of memory elements, which typically occupy most of the accelerator area. We present a complete methodology for designing the private local memories (PLMs) of multiple accelerators. Based on the memory requirements of each accelerator, our methodology automatically determines an area-efficient architecture for the PLMs to guarantee performance and reduce the memory cost based on technology-related information. We implemented a prototype tool, called Mnemosyne, that embodies our methodology within a commercial HLS flow. We designed 13 complex accelerators for selected applications from two recently-released benchmark suites (Perfect and CortexSuite). With our approach we are able to reduce the memory cost of single accelerators by up to 45%. Moreover, when reusing memory IPs across accelerators, we achieve area savings that range between 17% and 55% compared to the case where the PLMs are designed separately

    Fault-Tolerant Distributed Deployment of Embedded Control Software

    Full text link

    Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

    Get PDF
    Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM
    • …
    corecore