4,210 research outputs found
A Novel Workload Allocation Strategy for Batch Jobs
The distribution of computational tasks across a diverse set of geographically distributed heterogeneous resources is a critical issue in the realisation of true computational grids. Conventionally, workload allocation algorithms are divided into static and dynamic approaches. Whilst dynamic approaches frequently outperform static schemes, they usually require the collection and processing of detailed system information at frequent intervals - a task that can be both time consuming and unreliable in the real-world. This paper introduces a novel workload allocation algorithm for optimally distributing the workload produced by the arrival of batches of jobs. Results show that, for the arrival of batches of jobs, this workload allocation algorithm outperforms other commonly used algorithms in the static case. A hybrid scheduling approach (using this workload allocation algorithm), where information about the speed of computational resources is inferred from previously completed jobs, is then introduced and the efficiency of this approach demonstrated using a real world computational grid. These results are compared to the same workload allocation algorithm used in the static case and it can be seen that this hybrid approach comprehensively outperforms the static approach
Efficient Resource Matching in Heterogeneous Grid Using Resource Vector
In this paper, a method for efficient scheduling to obtain optimum job
throughput in a distributed campus grid environment is presented; Traditional
job schedulers determine job scheduling using user and job resource attributes.
User attributes are related to current usage, historical usage, user priority
and project access. Job resource attributes mainly comprise of soft
requirements (compilers, libraries) and hard requirements like memory, storage
and interconnect. A job scheduler dispatches jobs to a resource if a job's hard
and soft requirements are met by a resource. In current scenario during
execution of a job, if a resource becomes unavailable, schedulers are presented
with limited options, namely re-queuing job or migrating job to a different
resource. Both options are expensive in terms of data and compute time. These
situations can be avoided, if the often ignored factor, availability time of a
resource in a grid environment is considered. We propose resource rank
approach, in which jobs are dispatched to a resource which has the highest rank
among all resources that match the job's requirement. The results show that our
approach can increase throughput of many serial / monolithic jobs.Comment: 10 page
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
Scientific problems that depend on processing large amounts of data require
overcoming challenges in multiple areas: managing large-scale data
distribution, co-placement and scheduling of data with compute resources, and
storing and transferring large volumes of data. We analyze the ecosystems of
the two prominent paradigms for data-intensive applications, hereafter referred
to as the high-performance computing and the Apache-Hadoop paradigm. We propose
a basis, common terminology and functional factors upon which to analyze the
two approaches of both paradigms. We discuss the concept of "Big Data Ogres"
and their facets as means of understanding and characterizing the most common
application workloads found across the two paradigms. We then discuss the
salient features of the two paradigms, and compare and contrast the two
approaches. Specifically, we examine common implementation/approaches of these
paradigms, shed light upon the reasons for their current "architecture" and
discuss some typical workloads that utilize them. In spite of the significant
software distinctions, we believe there is architectural similarity. We discuss
the potential integration of different implementations, across the different
levels and components. Our comparison progresses from a fully qualitative
examination of the two paradigms, to a semi-quantitative methodology. We use a
simple and broadly used Ogre (K-means clustering), characterize its performance
on a range of representative platforms, covering several implementations from
both paradigms. Our experiments provide an insight into the relative strengths
of the two paradigms. We propose that the set of Ogres will serve as a
benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure
- …