28 research outputs found

    Dynamic Voltage Scaling Techniques for Power Efficient Video Decoding

    Get PDF
    This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies in decoding time predictions and granularity of processor settings used. Four techniques were simulated and compared in terms of power consumption, accuracy, and deadline misses. In addition, this paper proposes the frame-data computation aware (FDCA) technique, which is a useful power-saving technique not only for stored video but also for real-time video applications. The FDCA method is compared with the GOP, Direct, and Dynamic methods, which tend to be more suited for stored video applications. The simulation results indicated that the Dynamic per-frame technique, where the decoding time prediction adapts to the particular video being decoded, provides the most power saving with performance comparable to the ideal case. On the other hand, the FDCA method consumes more power than the Dynamic method but can be used for stored video and real-time time video scenarios without the need for any preprocessing. Our findings also indicate that, in general, DVS improves power savings, but the number of deadline misses also increase as the number of available processor settings increases. More importantly, most of these deadline misses are within 10–20% of the playout interval and thus have minimal affect on video quality. However, video clips with high variability in frame complexities combined with inaccurate decoding time predictions may degrade the video quality. Finally, our results show that a processor with 13 voltage/frequency settings is sufficient to achieve near maximum performance with the experimental environment and the video workloads we have used

    Characterization of L3 Cache Behavior of SPECjAppServer2002 and TPC-C

    No full text
    With the proliferation of e-businesses, Java â„¢ Middleware and OLTP applications are gaining importance. As the gap between CPU and memory latencies continues to increase, the performance of these applications running on multiprocessor systems will become further limited by the memory system. This study characterizes the memory behavior of such applications using the SPECjAppServer2002 and TPC-C benchmarks running on a real multiprocessor system. More specifically, the shared and private L3 caches with invalidation- and update-based coherence protocols are evaluated using the Programmable Hardware-Assisted Cache Emulator (PHA$E). We found that coherency misses increase with larger private L3 caches, constituting up to more than 15 % of all misses for both benchmarks. Additionally, a saturation point was observed at which employing larger private cache yields no further improvement in miss ratio. Conversely, the shared L3 cache design was observed to be more scalable since it does not suffer from coherence misses. Our limit study shows that the existing Write-Broadcast policy, which updates line copies in other caches during a write on a shared line, has the potential to simultaneously reduce private cache miss ratio and bus traffic. For example, at 64MB, it reduces the miss ratio by 53 % and 44 % respectively for SPECjAppServer2002 and TPC-C, while lowering the bus traffic by 18 % and 11%. In overall, the policy can eliminate the aforementioned saturation point and allows for private cache miss ratio that is comparable with the miss ratio of a shared cache
    corecore