11 research outputs found

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow

    No full text
    International audienceHigh-Level Synthesis (HLS) tools are mature enough to provide efficient code generation for computation kernels on FPGA hardware. For more complex applications, multiple kernels may be connected by a dataflow graph. Although some tools, such as Xilinx Vitis HLS, support dataflow directives, they lack efficient analysis methods to compute the buffer sizes between kernels in a dataflow graph. This paper proposes an original method to safely approximate such buffer sizes. The first contribution computes an initial overestimation of buffer sizes, wihout knowing the memory access patterns of kernels. The second contribution iteratively refines those buffer sizes thanks to cosimulation. Moreover, the paper introduces an open source framework using these methods to facilitate dataflow programming on FPGA using HLS. The proposed methods and framework have been tested on 7 dataflow applications, and outperform Vitis HLS cosimulation in 5 benchmarks, either in terms of BRAM and LUT usage, or in term of exploration time. In the 2 other benchmarks, our best method gets results similar to Vitis HLS. Last but not least, our method admits directed cycles in the application graphs

    Influence of Dataflow Graph Moldable Parameters on Optimization Criteria

    No full text
    International audienceThe integration of static parameters into Synchronous Dataflow (SDF) models enables the customization of an application functional and non-functional behaviours. However, these parameter values are generally set by the developer for a manual Design Space Exploration (DSE). Instead of a single value, moldable parameters accept a set of alternative values, representing all possible configurations of the application. The DSE is responsible for selecting the best parameter values to optimize a set of criteria such as latency, energy, or memory footprint. However, the DSE process explodes in complexity with the number of parameters and their possible values. In this paper, we study an automated DSE algorithm exploring multiple configurations of a dataflow application. Our experiments show that: 1) Only limited sets of configurations lead to Pareto-optimal solutions in a multi-criteria optimization scenario. 2) How individual parameters impact on optimization criteria are determined accurately from a limited subset of design points. The approach was evaluated on three image processing applications having from hundreds to thousands configurations

    preesm/preesm: ACM TRETS Artifact Evaluation

    No full text
    <p>Release for ACM TRETS Artifact Evaluation.</p><p>"Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow"</p><p><a href="https://dl.acm.org/doi/10.1145/3626103">https://dl.acm.org/doi/10.1145/3626103</a></p&gt

    Approximate Buffers for Reducing Memory Requirements: Case study on SKA

    No full text
    International audienceThe memory requirements of digital signal processing and multimedia applications have grown steadily over the last several decades. From embedded systems to supercomputers, the design of computing platforms involves a balance between processing elements and memory sizes to avoid the memory wall. This paper presents an algorithm based on both dataflow and approximate computing approaches in order to find a good balance between the memory requirements of an application and the quality of the result. The designer of the computing system can use these evaluations early in the design process to make hardware and software design decisions. The proposed method does not require any modification in the algorithm's computations, but optimizes how data are fetched from and written to memory. We show in this paper how the proposed algorithm saves 23.4% of memory for the full SKA SDP signal processing computing pipeline, and up to 68.75% for a wavelet transform in embedded systems

    Embedded Runtime for Reconfigurable Dataflow Graphs on Manycore Architectures

    No full text
    International audienceEmbedded manycore architectures offer energy-efficient super-computing capabilities but are notoriously difficult to program with traditional parallel Application Programming Interfaces (APIs). To address this challenge, dataflow Models of Computation (MoCs) are increasingly used as their high-level of abstraction eases the automation of computation mapping, memory allocation, and communication management. Reconfigurable dataflow is a class of dataflow MoC that fosters a unique trade-off between application dynamicity and predictability. This paper introduces the first embedded runtime manager enabling the execution of reconfigurable dataflow graphs on a Non-Uniform Memory Access (NUMA) architecture. The proposed runtime manager dynamically deploys reconfigurable dataflow graphs on clustered Processing Elements (PEs) through the Networks-on-Chips (NoCs) of the manycore architecture. An open-source implementation on the Kalray MPPA R processor demonstrates the feasibility and the great potential of such a runtime. The first results with an image processing application show a power efficiency 2.5 times better than on a multicore x86 architecture
    corecore