5 research outputs found

    Decoding-complexity-aware HEVC encoding using a complexity–rate–distortion model

    Get PDF
    The energy consumption of Consumer Electronic (CE) devices during media playback is inexorably linked to the computational complexity of decoding compressed video. Reducing a CE device's the energy consumption is therefore becoming ever more challenging with the increasing video resolutions and the complexity of the video coding algorithms. To this end, this paper proposes a framework that alters the video bit stream to reduce the decoding complexity and simultaneously limits the impact on the coding efficiency. In this context, this paper (i) first performs an analysis to determine the trade-off between the decoding complexity, video quality and bit rate with respect to a reference decoder implementation on a General Purpose Processor (GPP) architecture. Thereafter, (ii) a novel generic decoding complexity-aware video coding algorithm is proposed to generate decoding complexity-rate-distortion optimized High Efficiency Video Coding (HEVC) bit streams. The experimental results reveal that the bit streams generated by the proposed algorithm achieve 29.43% and 13.22% decoding complexity reductions for a similar video quality with minimal coding efficiency impact compared to the state-of-the-art approaches when applied to the HM16.0 and openHEVC decoder implementations, respectively. In addition, analysis of the energy consumption behavior for the same scenarios reveal up to 20% energy consumption reductions while achieving a similar video quality to that of HM 16.0 encoded HEVC bit streams

    Systematic Design Space Exploration of Dynamic Dataflow Programs for Multi-core Platforms

    Get PDF
    The limitations of clock frequency and power dissipation of deep sub-micron CMOS technology have led to the development of massively parallel computing platforms. They consist of dozens or hundreds of processing units and offer a high degree of parallelism. Taking advantage of that parallelism and transforming it into high program performances requires the usage of appropriate parallel programming models and paradigms. Currently, a common practice is to develop parallel applications using methods evolving directly from sequential programming models. However, they lack the abstractions to properly express the concurrency of the processes. An alternative approach is to implement dataflow applications, where the algorithms are described in terms of streams and operators thus their parallelism is directly exposed. Since algorithms are described in an abstract way, they can be easily ported to different types of platforms. Several dataflow models of computation (MoCs) have been formalized so far. They differ in terms of their expressiveness (ability to handle dynamic behavior) and complexity of analysis. So far, most of the research efforts have focused on the simpler cases of static dataflow MoCs, where many analyses are possible at compile-time and several optimization problems are greatly simplified. At the same time, for the most expressive and the most difficult to analyze dynamic dataflow (DDF), there is still a dearth of tools supporting a systematic and automated analysis minimizing the programming efforts of the designer. The objective of this Thesis is to provide a complete framework to analyze, evaluate and refactor DDF applications expressed using the RVC-CAL language. The methodology relies on a systematic design space exploration (DSE) examining different design alternatives in order to optimize the chosen objective function while satisfying the constraints. The research contributions start from a rigorous DSE problem formulation. This provides a basis for the definition of a complete and novel analysis methodology enabling systematic performance improvements of DDF applications. Different stages of the methodology include exploration heuristics, performance estimation and identification of refactoring directions. All of the stages are implemented as appropriate software tools. The contributions are substantiated by several experiments performed with complex dynamic applications on different types of physical platforms

    Traçage et profilage d'applications d'apprentissage automatique de type flot de données utilisant un processeur graphique

    Get PDF
    Actuellement, les besoins en puissance de calcul sont de plus en plus importants, alors que les améliorations au niveau du matériel commencent à ralentir. La puissance des processeurs et notamment leur fréquence de fonctionnement stagnent pour des raisons physiques comme la finesse de gravure ou la dissipation de chaleur. Afin de surpasser ces limites, le calcul en parallèle semble être une solution prometteuse avec l’utilisation d’architectures hétérogènes. Ces dernières mettent en oeuvre une combinaison de plusieurs unités de calculs de types possiblement différents, ce qui leur permet d’offrir un fonctionnement hautement parallèle. Malgré tout, utiliser l’ensemble du matériel efficacement reste difficile, et la programmation au niveau logiciel de ces architectures pose problème. Par conséquent, différents modèles ont émergé avec notamment les approches flot de données. Ces dernières proposent des caractéristiques très adaptées pour ce genre de contexte parallèle. Elles permettent de programmer plus facilement les différentes unités de calcul afin de bénéficier au maximum du matériel disponible. Dans un contexte de recherche de performance optimale, il est essentiel d’avoir des outils permettant de diagnostiquer d’éventuels problèmes. Quelques solutions ont déjà pu démontrer leur efficacité dans le cas d’un modèle de programmation plus traditionnel et séquentiel, utilisant ou non un processeur graphique. On retrouve par exemple des outils comme LTTng ou Ftrace destinés à l’analyse du processeur central. Concernant les processeurs graphiques, les outils propriétaires et à sources fermées, proposés par les constructeurs sont en général les plus complets et privilégiés par les programmeurs. Cela présente toutefois une limite, puisque les solutions ne sont pas générales et restent dépendantes du matériel proposé par un constructeur. Par ailleurs, elles offrent une flexibilité limitée avec des visualisations et analyses définies et fixes qui ne peuvent ni être modifiées ni améliorées en fonction des besoins d’un utilisateur. Finalement, aucun outil existant ne s’intéresse spécifiquement aux modèles flot de données.----------ABSTRACT: Recently, increasing computing capabilities have been required in various areas like scientific computing, video games and graphical rendering or artificial intelligence. These domains usually involve the processing of a large amount of data, intended to be performed as fast as possible. Unfortunately, hardware improvements have recently slowed down. The CPU clock speed, for example, is not increasing much any more, possibly nearing technological limits. Physical constraints like the heat dissipation or fine etching are the main reasons for that. Consequently, new opportunities like parallel processing using heterogeneous architectures became popular. In this context, the traditional processors get support from other computing units like graphical processors. In order to program these, the dataflow model offers several advantages. It is inherently parallel and thus well adapted. In this context, guaranteeing optimal performances is another main concern. For that, tracing and profiling central processing and graphical processing units are two useful techniques that can be considered. Several tools exist, like LTTng and FTrace that can trace the operating system and focus on the central processor. In addition, proprietary tools offered by hardware vendors can help to analyze and monitor the graphical processor. However, these tools are specific to one type of hardware and lack flexibility. Moreover, none of them target in particular dataflow applications executed on a heterogeneous platform

    Design of multicore HEVC decoders using actor-based dataflow models and OpenMP

    No full text
    International audienceNew multimedia portable devices support increasing spatial and temporal video resolutions as well as new and more efficient video codecs. This scenario leads to the use of multicore chips in order to comply with the higher computational loads. In a context dominated by the time to market pressure, the programming of multicore chips is a challenge. In this work, a method based on a combination of dataflow models and OpenMP is proposed for quick design of HEVC decoders using multicore chips. The proposed method has been automated by its integration into an open source framework. Tests have been carried out by implementing the same HEVC decoder targeted on three different multicore chips

    Design of Multicore HEVC Decoders Using Actor-based Dataflow Models and OpenMP

    No full text
    International audienceThis paper explains the design of a new back-end for the Open Reconfigurable Video Coding CAL Actor Language compiler framework (Orcc). This back-end uses the OpenMP API instead of the pthreads library to automatically generate C code for any multicore architecture with OpenMP support. With this back-end, implementations of an HEVC decoder have been automatically generated and tested for three different multicore target architectures. The new back-end does not introduce a performance penalty regarding to the actual C back-end, and enables the use of new target architectures, such as DSPs
    corecore