111,024 research outputs found
A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce
Nowadays many companies have available large amounts of raw, unstructured
data. Among Big Data enabling technologies, a central place is held by the
MapReduce framework and, in particular, by its open source implementation,
Apache Hadoop. For cost effectiveness considerations, a common approach entails
sharing server clusters among multiple users. The underlying infrastructure
should provide every user with a fair share of computational resources,
ensuring that Service Level Agreements (SLAs) are met and avoiding wastes. In
this paper we consider two mathematical programming problems that model the
optimal allocation of computational resources in a Hadoop 2.x cluster with the
aim to develop new capacity allocation techniques that guarantee better
performance in shared data centers. Our goal is to get a substantial reduction
of power consumption while respecting the deadlines stated in the SLAs and
avoiding penalties associated with job rejections. The core of this approach is
a distributed algorithm for runtime capacity allocation, based on Game Theory
models and techniques, that mimics the MapReduce dynamics by means of
interacting players, namely the central Resource Manager and Class Managers
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Redividing the Cake
A heterogeneous resource, such as a land-estate, is already divided among
several agents in an unfair way. It should be re-divided among the agents in a
way that balances fairness with ownership rights. We present re-division
protocols that attain various trade-off points between fairness and ownership
rights, in various settings differing in the geometric constraints on the
allotments: (a) no geometric constraints; (b) connectivity --- the cake is a
one-dimensional interval and each piece must be a contiguous interval; (c)
rectangularity --- the cake is a two-dimensional rectangle or rectilinear
polygon and the pieces should be rectangles; (d) convexity --- the cake is a
two-dimensional convex polygon and the pieces should be convex.
Our re-division protocols have implications on another problem: the
price-of-fairness --- the loss of social welfare caused by fairness
requirements. Each protocol implies an upper bound on the price-of-fairness
with the respective geometric constraints.Comment: Extended IJCAI 2018 version. Previous name: "How to Re-Divide a Cake
Fairly
- …