19 research outputs found

    Optimized fixed point implementation of a local stereo matching algorithm onto C66x DSP

    Get PDF
    International audienceStereo matching techniques aim at reconstructing disparity maps from a pair of images. The use of stereo matching techniques in embedded systems is very challenging due to the complexity of the state-of-the-art algorithms. An efficient local stereo matching algorithm has been chosen from the literature and implemented on a c6678 DSP. Arithmetic simplifications such as approximation by piecewise linear functions and fixed point conversions are proposed. Thanks to factorisation and pre-computing, the memory footprint is reduced by a factor 13 to fit on the memory footprint available on embedded systems. A 14.5 fps speed (factor 60 speed-up) has been reached with a small quality loss on the final disparity map

    Intégration rapide de services vidéo Mpeg sur architectures parallèles

    Get PDF
    Le temps réel pour des applications audiovisuelles est une forte contrainte qui nécessite la mise en oeuvre de plates-formes constituées de plusieurs unités de calcul. Le but de nos travaux est de développer un processus de prototypage rapide sur des architectures parallèles pour des applications de traitement d'image. Le processus de prototypage débute par la description des algorithmes grâce à une interface visuelle de programmation orientée objet. Cette description est ensuite transformée automatiquement pour pouvoir être utilisée par Syndex, un logiciel permettant d'évaluer et de générer l'ordonnancement des tâches de l'algorithme sur des architectures multiprocesseurs. Nous démontrons ici l'efficacité de notre méthodologie avec les développement d'une application Mpeg-2 conséquente et son implantation multi-DSP

    How programming models can manage the problem of scaling

    No full text
    International audienc

    Hardware code generation from dataflow programs

    Get PDF
    International audienceThe elaboration of new systems on embedded targets is becoming more and more complex. In particular, multimedia devices are now implemented using mixed hardware and software architecture, which improve the computational power but also increase the design complexity and the time to market. New design flows have been developed to help designers in the development of complex architecture. These design flows are often based on the use of languages with a higher level of abstraction. RVC-CAL is a dataflow programming language which provides the good features in this context. An RVC-CAL dataflow program can be compiled to various target software languages (e.g. C, Java, LLVM) with the Open RVC-CAL Compiler (Orcc). In this paper, we will present a new hardware code generator that generates a high-quality portable VHDL code with hierarchical architecture from a RVC-CAL dataflow program in a matter of seconds. The paper explains the underlying principles of the hardware code generator, and presents the results obtained from an Inverse DCT described as an RVC-CAL dataflow program

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Implementation of a Fast Fourier Transform Algorithm onto a Manycore Processor

    Get PDF
    International audienceThe Fourier transform is the main processing step applied to data collected from the Square Kilometre Array (SKA) receivers. The requirement is to compute a Fourier transform of 2 19 real byte samples in real-time, while minimizing the power consumption. We address this challenge by optimizing a FFT implementation for execution on the Kalray MPPA manycore processor. Although this processor delivers high floating-point performances, we use fixed-point number representations in order to reduce the memory consumption and the I/O bandwidth. The result is an execution time of 1,07ms per FFT, including data transfers. This enables to use only two first-generation MPPA chips per flow of data coming from the receivers, for a total power consumption of 17.4W
    corecore