278 research outputs found
Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management?
With the advent of big data, data center applications are processing vast amounts of unstructured and semi-structured data, in parallel on large clusters, across hundreds to thousands of nodes. The highest performance for these batch big data workloads is achieved using expensive network equipment with large buffers, which accommodate bursts in network traffic and allocate bandwidth fairly even when the network is congested. Throughput-sensitive big data applications are, however, often executed in the same data center as latency-sensitive workloads. For both workloads to be supported well, the network must provide both maximum throughput and low latency. Progress has been made in this direction, as modern network switches support Active Queue Management (AQM) and Explicit Congestion Notifications (ECN), both mechanisms to control the level of queue occupancy, reducing the total network latency. This paper is the first study of the effect of Active Queue Management on both throughput and latency, in the context of Hadoop and the MapReduce programming model. We give a quantitative comparison of four different approaches for controlling buffer occupancy and latency: RED and CoDel, both standalone and also combined with ECN and DCTCP network protocol, and identify the AQM configurations that maintain Hadoop execution time gains from larger buffers within 5%, while reducing network packet latency caused by bufferbloat by up to 85%. Finally, we provide recommendations to administrators of Hadoop clusters as to how to improve latency without degrading the throughput of batch big data workloads.The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007–2013) under grant agreement number 610456 (Euroserver).
The research was also supported by the Ministry of Economy and Competitiveness of Spain under the contracts TIN2012-34557 and TIN2015-65316-P, Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), HiPEAC-3 Network of Excellence (ICT- 287759), and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications
Datacenters running on-line, data-intensive applications (OLDIs) consume
significant amounts of energy. However, reducing their energy is challenging
due to their tight response time requirements. A key aspect of OLDIs is that
each user query goes to all or many of the nodes in the cluster, so that the
overall time budget is dictated by the tail of the replies' latency
distribution; replies see latency variations both in the network and compute.
Previous work proposes to achieve load-proportional energy by slowing down the
computation at lower datacenter loads based directly on response times (i.e.,
at lower loads, the proposal exploits the average slack in the time budget
provisioned for the peak load). In contrast, we propose TimeTrader to reduce
energy by exploiting the latency slack in the sub- critical replies which
arrive before the deadline (e.g., 80% of replies are 3-4x faster than the
tail). This slack is present at all loads and subsumes the previous work's
load-related slack. While the previous work shifts the leaves' response time
distribution to consume the slack at lower loads, TimeTrader reshapes the
distribution at all loads by slowing down individual sub-critical nodes without
increasing missed deadlines. TimeTrader exploits slack in both the network and
compute budgets. Further, TimeTrader leverages Earliest Deadline First
scheduling to largely decouple critical requests from the queuing delays of
sub- critical requests which can then be slowed down without hurting critical
requests. A combination of real-system measurements and at-scale simulations
shows that without adding to missed deadlines, TimeTrader saves 15-19% and
41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512
nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page
Job Schedulers for Machine Learning and Data Mining algorithms distributed in Hadoop
The standard scheduler of Hadoop does not consider the characteristics of jobs such as computational demand, inputs / outputs, dependencies, location of the data, etc., which could be a valuable source to allocate resources to jobs in order to optimize their use. The objective of this research is to take advantage of this information for planning, limiting the scope to ML / DM algorithms, in order to improve the execution times with respect to existing schedulers. The aim is to improve Hadoop job schedulers, seeking to optimize the execution times of machine learning and data mining algorithms in Clusters.Facultad de Informátic
Job Schedulers for Machine Learning and Data Mining algorithms distributed in Hadoop
The standard scheduler of Hadoop does not consider the characteristics of jobs such as computational demand, inputs / outputs, dependencies, location of the data, etc., which could be a valuable source to allocate resources to jobs in order to optimize their use. The objective of this research is to take advantage of this information for planning, limiting the scope to ML / DM algorithms, in order to improve the execution times with respect to existing schedulers. The aim is to improve Hadoop job schedulers, seeking to optimize the execution times of machine learning and data mining algorithms in Clusters.Facultad de Informátic
Job Schedulers for Machine Learning and Data Mining algorithms distributed in Hadoop
The standard scheduler of Hadoop does not consider the characteristics of jobs such as computational demand, inputs / outputs, dependencies, location of the data, etc., which could be a valuable source to allocate resources to jobs in order to optimize their use. The objective of this research is to take advantage of this information for planning, limiting the scope to ML / DM algorithms, in order to improve the execution times with respect to existing schedulers. The aim is to improve Hadoop job schedulers, seeking to optimize the execution times of machine learning and data mining algorithms in Clusters.Facultad de Informátic
- …