8 research outputs found

    Optimizations for real-time implementation of H264/AVC video encoder on DSP processor

    Get PDF
    International audienceReal-time H.264/AVC high definition video encoding represents a challenging workload to most existing programmable processors. The new technologies of programmable processors such as Graphic Processor Unit (GPU) and multicore Digital signal Processor (DSP) offer a very promising solution to overcome these constraints. In this paper, an optimized implementation of H264/AVC video encoder on a single core among the six cores of TMS320C6472 DSP for Common Intermediate Format (CIF) (352x288) resolution is presented in order to move afterwards to a multicore implementation for standard and high definitions (SD,HD).Algorithmic optimization is applied to the intra prediction module to reduce the computational time. Furthermore, based on the DSP architectural features, various structural and hardware optimizations are adopted to minimize external memory access. The parallelism between CPU processing and data transfers is fully exploited using an Enhanced Direct Memory Access controller (EDMA). Experimental results show that the whole proposed optimizations, on a single core running at 700 MHz for CIF resolution, improve the encoding speed by up to 42.91%. They allow reaching the real-time encoding 25 f/s without inducing any Peak Signal to Noise Ratio (PSNR) degradation or bit-rate increase and make possible to achieve real time implementation for SD and HD resolutions when exploiting multicore features

    Fast Motion Estimation’s Configuration Using Diamond Pattern and ECU, CFM, and ESD Modes for Reducing HEVC Computational Complexity

    Get PDF
    The high performance of the high efficiency video coding (HEVC) video standard makes it more suitable for high-definition resolutions. Nevertheless, this encoding performance is coupled with a tremendous encoding complexity compared to the earlier H264 video codec. The HEVC complexity is mainly a return to the motion estimation (ME) module that represents the important part of encoding time which makes several researches turn around the optimization of this module. Some works are interested in hardware solutions exploiting the parallel processing of FPGA, GPU, or other multicore architectures, and other works are focused on software optimizations by inducing fast mode decision algorithms. In this context, this article proposes a fast HEVC encoder configuration to speed up the encoding process. The fast configuration uses different options such as the early skip detection (ESD), the early CU termination (ECU), and the coded block flag (CBF) fast method (CFM) modes. Regarding the algorithm of ME, the diamond search (DS) is used in the encoding process through several video resolutions. A time saving around 46.75% is obtained with an acceptable distortion in terms of video quality and bitrate compared to the reference test model HM.16.2. Our contribution is compared to other works for better evaluation

    Définition et implantation matérielle d'un estimateur de mouvement configurable pour la compression vidéo adaptative

    Get PDF
    L objectif de cette thèse est la conception d une plateforme de compression vidéo de nouvelle génération à haut degré d adaptation vis-à-vis de l environnement. Ce besoin d adaptabilité a plusieurs origines. D une part les systèmes actuels visent à s adapter à la diversité et l hétérogénéité des médias et des terminaux actuels. D autre part, l exploitation de l information contenue dans une scène vidéo dépend de l application visée et des besoins des utilisateurs. Ainsi, l information peut être exploitée de manière complètement inhomogène spatialement ou temporellement. En effet, l exploitation spatiale de la scène peut être irrégulière par définition, par la définition automatique ou manuelle de zones d intérêts dans l image. La qualité de la vidéo, donc de la compression, doit pouvoir s adapter afin de limiter la quantité de donnée à transmettre. Cette qualité est donc dépendante de l évolution de la scène vidéo elle-même. Une architecture matérielle configurable a été proposée dans cette thèse permettant de supporter différents algorithmes de recherche en offrant une précision subpixélique.La synthèse des travaux menés dans ce domaine et la comparaison objective des résultats obtenus par rapport à l'état de l'art. L architecture proposée est synthétisée à base d un FPGA Virtex 6 FPGA, les résultats obtenus pourraient traiter l'estimation du mouvement pixélique avec un flux vidéo haute définition (HD 1080), respectivement à 13 images par seconde en utilisant la stratégie de recherche exhaustive (108K Macroblocs/s) et jusqu'à 223 images par seconde avec la recherche selon un grille en diamant (1,8 M Macroblocs /s). En outre le raffinement subpixélique en quart-pel est réalisé à Macroblocs 232k/ sThe aim of this thesis was to define and implement a hardware architecture of a configurable motion estimation capable of supporting various search strategies with the desired accuracy for adaptive video compression. This need for adaptability had several origins. Firstly, the current systems are designed to adapt to the diversity and heterogeneity of current terminals and media. Secondly, the use of information contained in a video scene depends on the intended applications and user needs. This objective scoring modestly in the challenge offered by the development of digital video requires a faster processing and a high compression ratio.In this thesis, a flexible hardware implementation of the motion estimator which enables the integer motion search algorithms to be modified and the fractional search as well as variable block size to be selected and adjusted. Hence, this novel architecture, especially designed for FPGA targets, proposes high-speed processing for a configuration which supports the variable size blocks and quarter-pelrefinement, as described in H.264. The proposed low-cost architecture based on Virtex 6 FPGA canprocess integer motion estimation on 1080 HD video streams, respectively, at 13 fps using full search strategy (108k Macroblocks/s) and up to 223 fps using diamond search (1.8M Macroblocks/s). Moreover subpel refinement in quarter-pel mode is performed at 232k Macroblocks/sDIJON-BU Doc.électronique (212319901) / SudocSudocFranceF

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Contribution à l'implantation optimisée de l'estimateur de mouvement de la norme H.264 sur plates-formes multi composants par extension de la méthode AAA

    Get PDF
    Mixed architectures containing programmable devices and reconfigurable ones can provide calculation performance necessary to meet constraints of real-time applications. But the implementation and optimization of these applications on this kind of architectures is a complex task that takes a lot of time. In this context, we propose a rapid prototyping tool for this type of architectures. This tool is based on our extension of the Adequacy Algorithm Architecture methodology (AAA). It allows to automatically perform optimized partitioning and scheduling of the application operations on the target architecture components and generation of correspondent codes. We used this tool for the implementation of the motion estimator of the H.264/AVC on an architecture composed of a Nios II processor and Altera Stratix III FPGA. So we were able to verify the correct running of our tool and validate our automatic generator of mixed codeLes architectures mixtes contenant des composants programmables et d'autres reconfigurables peuvent fournir les performances de calcul nécessaires pour satisfaire les contraintes imposées aux applications temps réel. Mais l'implantation et d'optimisation de ces applications temps réel sur ce type d'architectures est une tâche complexe qui prend un temps énorme. Dans ce contexte, nous proposons un outil de prototypage rapide visant ce type d'architectures. Cet outil se base sur une extension que nous proposons de la méthodologie Adéquation Algorithme Architecture (AAA). Il permet d'effectuer automatiquement le partitionnement et l'ordonnancement optimisés des opérations de l'application sur les composants de l'architecture cible et la génération automatique des codes correspondants. Nous avons utilisé cet outil pour l'implantation de l'estimateur de mouvement de la norme H.264/AVC sur une architecture composée d'un processeur NIOS II d'Altera et d'un FPGA Stratix III. Ainsi nous avons pu vérifier le bon fonctionnement de notre outil et validé notre générateur automatique de code mixt

    Hardware implementation and validation of the fast variable block size motion estimation architecture for H.264/AVC

    No full text
    International audienceBlock matching motion estimation is the heart of video coding system. It leads to a high compression ratio, whereas it is time consuming and calculation intensive. Many fast search block matching motion estimation algorithms have been developed in order to minimize search positions and speed up computation but they do not take into account how they can be effectively implemented by hardware. In this paper, we propose an efficient hardware architecture of the fast line diamond parallel search (LDPS) algorithm with variable block size motion estimation (VBSME) for H.264/AVC video coding system. The design is described in VHDL language, synthesized to Altera Stratix III FPGA and to TSMC 0.18 m standard-cells. The throughput of the hardware architecture reaches a processing rate up to 78 millions of pixels per second at 83.5 MHz frequency clock and uses only 28 kgates when mapped to standard-cells. Finally, a system on a programmable chip (SoPC) implementation and validation of the proposed design as an IP core is presented using the embedded video system
    corecore