114,905 research outputs found
ERA: A Framework for Economic Resource Allocation for the Cloud
Cloud computing has reached significant maturity from a systems perspective,
but currently deployed solutions rely on rather basic economics mechanisms that
yield suboptimal allocation of the costly hardware resources. In this paper we
present Economic Resource Allocation (ERA), a complete framework for scheduling
and pricing cloud resources, aimed at increasing the efficiency of cloud
resources usage by allocating resources according to economic principles. The
ERA architecture carefully abstracts the underlying cloud infrastructure,
enabling the development of scheduling and pricing algorithms independently of
the concrete lower-level cloud infrastructure and independently of its
concerns. Specifically, ERA is designed as a flexible layer that can sit on top
of any cloud system and interfaces with both the cloud resource manager and
with the users who reserve resources to run their jobs. The jobs are scheduled
based on prices that are dynamically calculated according to the predicted
demand. Additionally, ERA provides a key internal API to pluggable algorithmic
modules that include scheduling, pricing and demand prediction. We provide a
proof-of-concept software and demonstrate the effectiveness of the architecture
by testing ERA over both public and private cloud systems -- Azure Batch of
Microsoft and Hadoop/YARN. A broader intent of our work is to foster
collaborations between economics and system communities. To that end, we have
developed a simulation platform via which economics and system experts can test
their algorithmic implementations
Multi-core job submission and grid resource scheduling for ATLAS AthenaMP
AthenaMP is the multi-core implementation of the ATLAS software framework and allows the efficient sharing of memory pages between multiple threads of execution. This has now been validated for production and delivers a significant reduction on the overall application memory footprint with negligible CPU overhead. Before AthenaMP can be routinely run on the LHC Computing Grid it must be determined how the computing resources available to ATLAS can best exploit the notable improvements delivered by switching to this multi-process model. A study into the effectiveness and scalability of AthenaMP in a production environment will be presented. Best practices for configuring the main LRMS implementations currently used by grid sites will be identified in the context of multi-core scheduling optimisation
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
- …