Search CORE

11 research outputs found

Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

Author: Miomandre Hugo
Ménard Daniel
Nezan Jean Francois
Publication venue: HAL CCSD
Publication date: 12/07/2022
Field of study

International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

HAL-CentraleSupelec

Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

Author: Miomandre Hugo
Ménard Daniel
Nezan Jean Francois
Publication venue: HAL CCSD
Publication date: 12/07/2022
Field of study

HAL-Rennes 1

Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

Author: Miomandre Hugo
Ménard Daniel
Nezan Jean Francois
Publication venue: HAL CCSD
Publication date: 12/07/2022
Field of study

Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow

Author: Dardaillon Mickaël
Honorat Alexandre
Miomandre Hugo
Nezan Jean-François
Publication venue: ACM
Publication date: 29/09/2023
Field of study

International audienceHigh-Level Synthesis (HLS) tools are mature enough to provide efficient code generation for computation kernels on FPGA hardware. For more complex applications, multiple kernels may be connected by a dataflow graph. Although some tools, such as Xilinx Vitis HLS, support dataflow directives, they lack efficient analysis methods to compute the buffer sizes between kernels in a dataflow graph. This paper proposes an original method to safely approximate such buffer sizes. The first contribution computes an initial overestimation of buffer sizes, wihout knowing the memory access patterns of kernels. The second contribution iteratively refines those buffer sizes thanks to cosimulation. Moreover, the paper introduces an open source framework using these methods to facilitate dataflow programming on FPGA using HLS. The proposed methods and framework have been tested on 7 dataflow applications, and outperform Vitis HLS cosimulation in 5 benchmarks, either in terms of BRAM and LUT usage, or in term of exploration time. In the 2 other benchmarks, our best method gets results similar to Vitis HLS. Last but not least, our method admits directed cycles in the application graphs

HAL-CentraleSupelec

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

An Initial Framework for Prototyping the Radio-Interferometric Imaging Pipelines

Author: Desnos Karol
Gac Nicolas
Miomandre Hugo
Nezan Jean-François
Orieux François
Wang Sunrise
Publication venue: HAL CCSD
Publication date: 17/01/2024
Field of study

International audienc

HAL-CentraleSupelec

Influence of Dataflow Graph Moldable Parameters on Optimization Criteria

Author: Bourgoin Thomas
Desnos Karol
Honorat Alexandre
Menard Daniel
Miomandre Hugo
Nezan Jean-François
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/06/2022
Field of study

International audienceThe integration of static parameters into Synchronous Dataflow (SDF) models enables the customization of an application functional and non-functional behaviours. However, these parameter values are generally set by the developer for a manual Design Space Exploration (DSE). Instead of a single value, moldable parameters accept a set of alternative values, representing all possible configurations of the application. The DSE is responsible for selecting the best parameter values to optimize a set of criteria such as latency, energy, or memory footprint. However, the DSE process explodes in complexity with the number of parameters and their possible values. In this paper, we study an automated DSE algorithm exploring multiple configurations of a dataflow application. Our experiments show that: 1) Only limited sets of configurations lead to Pareto-optimal solutions in a multi-criteria optimization scenario. 2) How individual parameters impact on optimization criteria are determined accurately from a limited subset of design points. The approach was evaluated on three image processing applications having from hundreds to thousands configurations

HAL-CentraleSupelec

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

preesm/preesm: ACM TRETS Artifact Evaluation

Author: Dardaillon Mickaël
Desnos Karol
et al.
Honorat Alexandre
Miomandre Hugo
Nezan Jean-Francois
Pelcat Maxime
Publication venue: Zenodo
Publication date: 26/10/2023
Field of study

Release for ACM TRETS Artifact Evaluation."Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow"<a href="https://dl.acm.org/doi/10.1145/3626103">https://dl.acm.org/doi/10.1145/3626103</a></p&gt

ZENODO

Approximate Buffers for Reducing Memory Requirements: Case study on SKA

Author: Campbell Adam
Ensor Andrew
Griffin Anthony
Hall Seth
Menard Daniel
Miomandre Hugo
Nezan Jean-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/10/2020
Field of study

International audienceThe memory requirements of digital signal processing and multimedia applications have grown steadily over the last several decades. From embedded systems to supercomputers, the design of computing platforms involves a balance between processing elements and memory sizes to avoid the memory wall. This paper presents an algorithm based on both dataflow and approximate computing approaches in order to find a good balance between the memory requirements of an application and the quality of the result. The designer of the computing system can use these evaluations early in the design process to make hardware and software design decisions. The proposed method does not require any modification in the algorithm's computations, but optimizes how data are fetched from and written to memory. We show in this paper how the proposed algorithm saves 23.4% of memory for the full SKA SDP signal processing computing pipeline, and up to 68.75% for a wavelet transform in embedded systems

HAL-CentraleSupelec

Crossref

HAL-Rennes 1

Embedded Runtime for Reconfigurable Dataflow Graphs on Manycore Architectures

Author: Desnos Karol
Dupont de Dinechin Benoît
Hascoët Julien
Martin Kevin
Miomandre Hugo
Nezan Jean-François
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/01/2018
Field of study

International audienceEmbedded manycore architectures offer energy-efficient super-computing capabilities but are notoriously difficult to program with traditional parallel Application Programming Interfaces (APIs). To address this challenge, dataflow Models of Computation (MoCs) are increasingly used as their high-level of abstraction eases the automation of computation mapping, memory allocation, and communication management. Reconfigurable dataflow is a class of dataflow MoC that fosters a unique trade-off between application dynamicity and predictability. This paper introduces the first embedded runtime manager enabling the execution of reconfigurable dataflow graphs on a Non-Uniform Memory Access (NUMA) architecture. The proposed runtime manager dynamically deploys reconfigurable dataflow graphs on clustered Processing Elements (PEs) through the Networks-on-Chips (NoCs) of the manycore architecture. An open-source implementation on the Kalray MPPA R processor demonstrates the feasibility and the great potential of such a runtime. The first results with an image processing application show a power efficiency 2.5 times better than on a multicore x86 architecture

HAL-CentraleSupelec

HAL-Université de Bretagne Occidentale

HAL-Rennes 1