1,035 research outputs found
Energy Efficient Data-Intensive Computing With Mapreduce
Power and energy consumption are critical constraints in data center design and operation. In data centers, MapReduce data-intensive applications demand significant resources and energy. Recognizing the importance and urgency of optimizing energy usage of MapReduce applications, this work aims to provide instrumental tools to measure and evaluate MapReduce energy efficiency and techniques to conserve energy without impacting performance.
Energy conservation for data-intensive computing requires enabling technology to provide detailed and systemic energy information and to identify in the underlying system hardware and software. To address this need, we present eTune, a fine-grained, scalable energy profiling framework for data-intensive computing on large-scale distributed systems. eTune leverages performance monitoring counters (PMCs) on modern computer components and statistically builds power-performance correlation models. Using learned models, eTune augments direct measurement with a software-based power estimator that runs on compute nodes and reports power at multiple levels including node, core, memory, and disks with high accuracy.
Data-intensive computing differs from traditional high performance computing as most execution time is spent in moving data between storage devices, nodes, and components. Since data movements are potential performance and energy bottlenecks, we propose an analysis framework with methods and metrics for evaluating and characterizing costly built-in MapReduce data movements. The revealed data movement energy characteristics can be exploited in system design and resource allocation to improve data-intensive computing energy efficiency.
Finally, we present an optimization technique that targets inefficient built-in MapReduce data movements to conserve energy without impacting performance. The optimization technique allocates the optimal number of compute nodes to applications and dynamically schedules processor frequency during its execution based on data movement characteristics. Experimental results show significant energy savings, though improvements depend on both workload characteristics and policies of resource and dynamic voltage and frequency scheduling.
As data volume doubles every two years and more data centers are put into production, energy consumption is expected to grow further. We expect these studies provide direction and insight in building more energy efficient data-intensive systems and applications, and the tools and techniques are adopted by other researchers for their energy efficient studies
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications
Datacenters running on-line, data-intensive applications (OLDIs) consume
significant amounts of energy. However, reducing their energy is challenging
due to their tight response time requirements. A key aspect of OLDIs is that
each user query goes to all or many of the nodes in the cluster, so that the
overall time budget is dictated by the tail of the replies' latency
distribution; replies see latency variations both in the network and compute.
Previous work proposes to achieve load-proportional energy by slowing down the
computation at lower datacenter loads based directly on response times (i.e.,
at lower loads, the proposal exploits the average slack in the time budget
provisioned for the peak load). In contrast, we propose TimeTrader to reduce
energy by exploiting the latency slack in the sub- critical replies which
arrive before the deadline (e.g., 80% of replies are 3-4x faster than the
tail). This slack is present at all loads and subsumes the previous work's
load-related slack. While the previous work shifts the leaves' response time
distribution to consume the slack at lower loads, TimeTrader reshapes the
distribution at all loads by slowing down individual sub-critical nodes without
increasing missed deadlines. TimeTrader exploits slack in both the network and
compute budgets. Further, TimeTrader leverages Earliest Deadline First
scheduling to largely decouple critical requests from the queuing delays of
sub- critical requests which can then be slowed down without hurting critical
requests. A combination of real-system measurements and at-scale simulations
shows that without adding to missed deadlines, TimeTrader saves 15-19% and
41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512
nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page
- …