240 research outputs found

    Dynamic Voltage Scaling Techniques for Power Efficient Video Decoding

    Get PDF
    This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies in decoding time predictions and granularity of processor settings used. Four techniques were simulated and compared in terms of power consumption, accuracy, and deadline misses. In addition, this paper proposes the frame-data computation aware (FDCA) technique, which is a useful power-saving technique not only for stored video but also for real-time video applications. The FDCA method is compared with the GOP, Direct, and Dynamic methods, which tend to be more suited for stored video applications. The simulation results indicated that the Dynamic per-frame technique, where the decoding time prediction adapts to the particular video being decoded, provides the most power saving with performance comparable to the ideal case. On the other hand, the FDCA method consumes more power than the Dynamic method but can be used for stored video and real-time time video scenarios without the need for any preprocessing. Our findings also indicate that, in general, DVS improves power savings, but the number of deadline misses also increase as the number of available processor settings increases. More importantly, most of these deadline misses are within 10–20% of the playout interval and thus have minimal affect on video quality. However, video clips with high variability in frame complexities combined with inaccurate decoding time predictions may degrade the video quality. Finally, our results show that a processor with 13 voltage/frequency settings is sufficient to achieve near maximum performance with the experimental environment and the video workloads we have used

    Dynamic Voltage Scaling Techniques for Power Efficient Video Decoding

    Get PDF
    This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies in decoding time predictions and granularity of processor settings used. Four techniques were simulated and compared in terms of power consumption, accuracy, and deadline misses. In addition, this paper proposes the frame-data computation aware (FDCA) technique, which is a useful power-saving technique not only for stored video but also for real-time video applications. The FDCA method is compared with the GOP, Direct, and Dynamic methods, which tend to be more suited for stored video applications. The simulation results indicated that the Dynamic per-frame technique, where the decoding time prediction adapts to the particular video being decoded, provides the most power saving with performance comparable to the ideal case. On the other hand, the FDCA method consumes more power than the Dynamic method but can be used for stored video and real-time time video scenarios without the need for any preprocessing. Our findings also indicate that, in general, DVS improves power savings, but the number of deadline misses also increase as the number of available processor settings increases. More importantly, most of these deadline misses are within 10–20% of the playout interval and thus have minimal affect on video quality. However, video clips with high variability in frame complexities combined with inaccurate decoding time predictions may degrade the video quality. Finally, our results show that a processor with 13 voltage/frequency settings is sufficient to achieve near maximum performance with the experimental environment and the video workloads we have used

    Application-directed voltage scaling

    Full text link

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RFCs)

    Get PDF
    The general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. Reconfigurable computing has emerged to combine the versatility of general-purpose processors with the customization ability of ASICs. The basic premise of reconfigurability is to provide better performance and higher computing density than fixed configuration processors. Most of the research in reconfigurable computing is dedicated to on-chip functional logic. If computing resources are adaptable to the computing requirement, the maximum performance can be achieved. To overcome the gap between processor and memory technology, the size of on-chip cache memory has been consistently increasing. The larger cache memory capacity, though beneficial in general, does not guarantee a higher performance for all the applications as they may not utilize all of the cache efficiently. To utilize on-chip resources effectively and to accelerate the performance of multimedia applications specifically, we propose a new architecture---Adaptive Balanced Computing (ABC). ABC uses dynamic resource configuration of on-chip cache memory by integrating Reconfigurable Functional Caches (RFC). RFC can work as a conventional cache or as a specialized computing unit when necessary. In order to convert a cache memory to a computing unit, we include additional logic to embed multi-bit output LUTs into the cache structure. We add the reconfigurability of cache memory to a conventional processor with minimal modification to the load/store microarchitecture and with minimal compiler assistance. ABC architecture utilizes resources more efficiently by reconfiguring the cache memory to computing units dynamically. The area penalty for this reconfiguration is about 50--60% of the memory cell cache array-only area with faster cache access time. In a base array cache (parallel decoding caches), the area penalty is 10--20% of the data array with 1--2% increase in the cache access time. However, we save 27% for FIR and 44% for DCT/IDCT in area with respect to memory cell array cache and about 80% for both applications with respect to base array cache if we were to implement all these units separately (such as ASICs). The simulations with multimedia and DSP applications (DCT/IDCT and FIR/IIR) show that the resource configuration with the RFC speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations. The simulations with various parameters indicate that the impact of reconfiguration can be minimized if an appropriate cache organization is selected

    Performance and Energy Consumption Characterization and Modeling of Video Decoding on Multi-core Heterogenous SoC and their Applications

    Get PDF
    To meet the increasing complexity of mobile multimedia applications, the System on Chip (SoC) equipping modern mobile devices integrate powerful heterogeneous processing elements among which General Purpose Processors (GPP), Digital Signal Processors (DSP), hardware accelerator are the most common ones.Due to the ever-growing gap between battery lifetime and hardware/software complexity in addition to application computing power needs, the energy saving issue becomes crucial in the design of such systems. In this context, we propose a study aiming to enhance the understanding of the energy consumption behavior of video decoding on these kinds of systems. Accordingly, an end-to-end methodology for characterizing and modeling the performance and the energy consumption of video decoding on GPP and DSP is proposed. The characterization step is based on an exhaustive experimental methodology for evaluating, at different abstraction levels, the performance and the energy consumption of video decoding. It was achieved on embedded platforms on which were executed a wide range of video decoding configurations. This step highlighted the importance to consider different parameters which may pertain to different abstraction levels in evaluating the overall energy efficiency of a given system. The measurements obtained in this step were used to build empirically performance and energy models for video decoding on both GPP and DSP. The proposed models gave very accurate estimation (R 2 = 97%) of both the performance and the energy consumption of video decoding in terms of a rich set of parameters including the video quality and the processor frequency. Moreover, based on a multi-level characterization and sub-model decomposition approaches, we show how the developed models, unlike classic empirical models, are easily and rapidly generalizable to other platforms.Some possible applications using the developed models, in the context of adaptive video decoding, were proposed. In general, it consists to use the capability of the proposed performance model to predict the decoding time of a given video quality in dimensioning/scheduling the processing resources. Due to the increasing demand on High Definition (HD), the characterization methodology was extended to consider HD video decoding on both parallel multi-cores and hardware video accelerator. This part highlighted the potential of parallelism video decoding to increase the energy efficiency of video decoding and point out some open issues in this domain.Pour répondre à la complexité croissante des applications multimédia mobiles, les systèmes sur puce équipant les appareils mobiles modernes intègrent des unités de calcul puissantes et hétérogène. Parmi ces units de calcul, on peut trouver des processeurs à usage général, des processeur de traitement de signal et des accélérateurs matériels. En raison de l’écart toujours croissant entre la durée de vie des batteries et la demande de plus en plus importante en puissance de calcul, l’économie d’énergie devient un enjeu crucial dans la conception des systèmes mobiles. Cette problématique est accentuée par l’augmentation de la complexité des logiciels et architectures matériels utilisés. Dans ce contexte, nous proposons une étude visant à améliorer la compréhension des considérations énergétiques du décodage vidéo sur ce genre de systèmes. Nous proposerons ainsi une méthodologie pour la caractérisation et la modélisation des performances et de la consommation d’énergie du décodage vidéo, aussi bien sur des processeurs à usage général de type ARM que sur un processeurde traitement de signal. L’étape de caractérisation est basée sur une méthodologie expérimentale pour évaluer de façon exhaustive et à différents niveaux d’abstraction, les performances et la consommation d’énergie du décodage vidéo. Cette caractérisation a été réalisée sur des plates-formes embarquées sur lesquels ont été exécutés un large éventail de configurations du décodage vidéo. Cette étape a souligné l’importance d’examiner différents paramètres qui peuvent se rapporter à différents niveaux d’abstraction dans l’évaluation de l’efficacité énergétique globale d’un système donné. Les mesures obtenues dans cette étape ont été utilisées pour construire empiriquement des modèles de performance et de consommation d’énergie pour le décodage vidéo à la fois sur des processeurs à usage général type ARM et sur un processeur de traitement de signal. Les modèles proposés peuvent estimer avec une grande précision (R 2 = 97%) la performance et la consommation d’énergie de décodage vidéo en fonction d’un nombre de paramètres comprenant la qualité de la vidéo et la fréquence du processeur. En plus, en se basant sur une caractérisation multi-niveaux et une approches de modélisation par décomposition en sous-modèles, nous montrons comment les modèles développés, contrairement aux modèles empiriques classiques, sont facilement et rapidement généralisables à d’autres plates-formes. Nous proposerons également certaines applications possibles des modèles développés, dans le cadre du décodage vidéo adaptatif. En général, cela consiste à exploiter la capacité du modèle de performance proposé pour prédire le temps de décodage d’une qualité vidéo donnée afin de mieux dimensionner les ressources de calculs dans un but de réduire leur consommationd’énergie

    Parallelism and the software-hardware interface in embedded systems

    Get PDF
    This thesis by publications addresses issues in the architecture and microarchitecture of next generation, high performance streaming Systems-on-Chip through quantifying the most important forms of parallelism in current and emerging embedded system workloads. The work consists of three major research tracks, relating to data level parallelism, thread level parallelism and the software-hardware interface which together reflect the research interests of the author as they have been formed in the last nine years. Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vectorlSIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes. The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapSUlate that functionality.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Software-controlled processor speed setting for low-power streaming multimedia

    Full text link

    Quality-aware performance analysis for multimedia MPSoC platforms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore