21 research outputs found

    Energy Efficient Task Mapping and Resource Management on Multi-core Architectures

    Get PDF
    Reducing energy consumption of parallel applications executing on chip multi- processors (CMPs) is important for green computing. Hardware vendors have been developing a variety of system features to support energy efficient computing, for example, integrating asymmetric core types on a single chip referred to as static asymmetry and supporting dynamic voltage and frequency scaling (DVFS) referred to as dynamic asymmetry.A common parallelization scheme to exploit CMPs is task parallelism, which can express a wide range of computations in the form of task directed acyclic graphs (DAGs). Existing studies that target energy efficient task scheduling have demonstrated the benefits of leveraging DVFS, particularly per-core DVFS. Their scheduling decisions are mainly based on heuristics, such as task criticality, task dependencies and workload sizes. To enable energy efficient task scheduling, we identify multiple crucial factors that influence energy consumption - varying task characteristics, exploitation of intra-task parallelism (task moldability), and task granularity - which we collectively refer to as task heterogeneity. Task heterogeneity and architecture asymmetry features together complicate the task scheduling problem, since the most energy efficient configuration of resource allocation and frequency setting varies with each task. Our analysis shows that leveraging task heterogeneity in conjunction with static and dynamic asymmetry provides significant opportunities for energy reduction.This thesis contributes two scheduling techniques - ERASE and STEER - that target different scenarios. ERASE focuses on fine-grained tasking and in environments where DVFS is not under user control. It leverages the insights of task characteristics, task moldability, and instantaneous task parallelism detection for guiding scheduling decisions. ERASE comprises four modules: online performance modeling, power profiling, core activity tracing and a task scheduler. Online performance modeling and power profiling provide runtime with execution time and power predictions. Core activity tracing offers the instantaneous task parallelism and the task scheduler combines these information to enable the energy predictions and dynamically determine the best resource allocation for each task during runtime. STEER focuses on environments where DVFS is under user control and where the platform comprises multiple asymmetric cores grouped into clusters. STEER explores how much energy could be potentially saved by leveraging static asymmetry, dynamic asymmetry and task heterogeneity in conjunction. STEER comprises two predictive models for performance and power predictions, and a task scheduler that utilizes models for energy predictions and then identifies the best resource allocation and frequency settings for tasks. Moreover, it applies adaptive scheduling techniques based on task granularity to manage DVFS overheads, and coordinates the cluster frequency settings to reduce interference from concurrent running tasks on cluster-based architectures.The evaluation on an NVIDIA Jetson TX2 shows that ERASE achieves 10% energy savings on average compared to the state-of-the-art DVFS-based schedulers and can adapt to external DVFS changes, and STEER consumes 38% less energy on average than both the state-of-the-art and ERASE

    Dynamic Voltage Scaling Techniques for Power Efficient Video Decoding

    Get PDF
    This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies in decoding time predictions and granularity of processor settings used. Four techniques were simulated and compared in terms of power consumption, accuracy, and deadline misses. In addition, this paper proposes the frame-data computation aware (FDCA) technique, which is a useful power-saving technique not only for stored video but also for real-time video applications. The FDCA method is compared with the GOP, Direct, and Dynamic methods, which tend to be more suited for stored video applications. The simulation results indicated that the Dynamic per-frame technique, where the decoding time prediction adapts to the particular video being decoded, provides the most power saving with performance comparable to the ideal case. On the other hand, the FDCA method consumes more power than the Dynamic method but can be used for stored video and real-time time video scenarios without the need for any preprocessing. Our findings also indicate that, in general, DVS improves power savings, but the number of deadline misses also increase as the number of available processor settings increases. More importantly, most of these deadline misses are within 10–20% of the playout interval and thus have minimal affect on video quality. However, video clips with high variability in frame complexities combined with inaccurate decoding time predictions may degrade the video quality. Finally, our results show that a processor with 13 voltage/frequency settings is sufficient to achieve near maximum performance with the experimental environment and the video workloads we have used

    Dynamic Voltage Scaling Techniques for Power Efficient Video Decoding

    Get PDF
    This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies in decoding time predictions and granularity of processor settings used. Four techniques were simulated and compared in terms of power consumption, accuracy, and deadline misses. In addition, this paper proposes the frame-data computation aware (FDCA) technique, which is a useful power-saving technique not only for stored video but also for real-time video applications. The FDCA method is compared with the GOP, Direct, and Dynamic methods, which tend to be more suited for stored video applications. The simulation results indicated that the Dynamic per-frame technique, where the decoding time prediction adapts to the particular video being decoded, provides the most power saving with performance comparable to the ideal case. On the other hand, the FDCA method consumes more power than the Dynamic method but can be used for stored video and real-time time video scenarios without the need for any preprocessing. Our findings also indicate that, in general, DVS improves power savings, but the number of deadline misses also increase as the number of available processor settings increases. More importantly, most of these deadline misses are within 10–20% of the playout interval and thus have minimal affect on video quality. However, video clips with high variability in frame complexities combined with inaccurate decoding time predictions may degrade the video quality. Finally, our results show that a processor with 13 voltage/frequency settings is sufficient to achieve near maximum performance with the experimental environment and the video workloads we have used

    Dynamic Voltage and Frequency Scaling for Wireless Network-on-Chip

    Get PDF
    Previously, research and design of Network-on-Chip (NoC) paradigms where mainly focused on improving the performance of the interconnection networks. With emerging wide range of low-power applications and energy constrained high-performance applications, it is highly desirable to have NoCs that are highly energy efficient without incurring performance penalty. In the design of high-performance massive multi-core chips, power and heat have become dominant constrains. Increased power consumption can raise chip temperature, which in turn can decrease chip reliability and performance and increase cooling costs. It was proven that Small-world Wireless Network-on-Chip (SWNoC) architecture which replaces multi-hop wire-line path in a NoC by high-bandwidth single hop long range wireless links, reduces the overall energy dissipation when compared to wire-line mesh-based NoC architecture. However, the overall energy dissipation of the wireless NoC is still dominated by wire-line links and switches (buffers). Dynamic Voltage Scaling is an efficient technique for significant power savings in microprocessors. It has been proposed and deployed in modern microprocessors by exploiting the variance in processor utilization. On a Network-on-Chip paradigm, it is more likely that the wire-line links and buffers are not always fully utilized even for different applications. Hence, by exploiting these characteristics of the links and buffers over different traffic, DVFS technique can be incorporated on these switches and wire-line links for huge power savings. In this thesis, a history based DVFS mechanism is proposed. This mechanism uses the past utilization of the wire-line links & buffers to predict the future traffic and accordingly tune the voltage and frequency for the links and buffers dynamically for each time window. This mechanism dynamically minimizes the power consumption while substantially maintaining a high performance over the system. Performance analysis on these DVFS enabled Wireless NoC shows that, the overall energy dissipation is improved by around 40% when compared Small-world Wireless NoCs

    Optimality study of resource binding with multi-Vdds

    Full text link

    An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems

    Get PDF
    Emerging technologies provide SoCs with fine-grained DVFS capabilities both in space (number of domains) and time (transients in the order of tens of nanoseconds). Analyzing these systems requires cycle-accurate accounting of rapidly-changing dynamics and complex interactions among accelerators, interconnect, memory, and OS. We present an FPGA-based infrastructure that facilitates such analyses for high-performance embedded systems. We show how our infrastructure can be used to first generate SoCs with loosely-coupled accelerators, and then perform design-space exploration considering several DVFS policies under full-system workload scenarios, sweeping spatial and temporal domain granularity

    Optimality study of resource binding with multi-Vdds

    Full text link
    Deploying multiple supply voltages (multi-Vdds) on one chip is an important technique to reduce dynamic power consumption. In this work we present an optimality study for resource binding targeting designs with multi-Vdds. This is similar to the voltage-island design concept, except that the granularity of our voltage island is on the functional-unit level as opposed to the core level. We are interested in achieving the maximum number of low-Vdd operations and, in the same time, minimizing switching activity during functional unit binding. To the best of our knowledge, there is no known optimal solution to this problem. To compute an optimal solution for this problem and examine the quality gap between our solution and previous heuristic solutions, we formulate this problem as a min-cost network flow problem, but with special equal-flow constraints. This formulation leads to an easy reduction to the integer linear programming (ILP) solution and also enables efficient approximate solution by Lagrangian relaxation. Experimental results show that the optimal solution computed based on our formulation provides 7% more low-Vdd operations and also reduces the total switching activity by 20% compared to one of the best known heuristic algorithms that consider multi-Vdd assignments only. Copyright 2006 ACM.EI