2,617 research outputs found
Bulk Scheduling with the DIANA Scheduler
Results from the research and development of a Data Intensive and Network
Aware (DIANA) scheduling engine, to be used primarily for data intensive
sciences such as physics analysis, are described. In Grid analyses, tasks can
involve thousands of computing, data handling, and network resources. The
central problem in the scheduling of these resources is the coordinated
management of computation and data at multiple locations and not just data
replication or movement. However, this can prove to be a rather costly
operation and efficient sing can be a challenge if compute and data resources
are mapped without considering network costs. We have implemented an adaptive
algorithm within the so-called DIANA Scheduler which takes into account data
location and size, network performance and computation capability in order to
enable efficient global scheduling. DIANA is a performance-aware and
economy-guided Meta Scheduler. It iteratively allocates each job to the site
that is most likely to produce the best performance as well as optimizing the
global queue for any remaining jobs. Therefore it is equally suitable whether a
single job is being submitted or bulk scheduling is being performed. Results
indicate that considerable performance improvements can be gained by adopting
the DIANA scheduling approach.Comment: 12 pages, 11 figures. To be published in the IEEE Transactions in
Nuclear Science, IEEE Press. 200
DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling
The use of meta-schedulers for resource management in large-scale distributed
systems often leads to a hierarchy of schedulers. In this paper, we discuss why
existing meta-scheduling hierarchies are sometimes not sufficient for Grid
systems due to their inability to re-organise jobs already scheduled locally.
Such a job re-organisation is required to adapt to evolving loads which are
common in heavily used Grid infrastructures. We propose a peer-to-peer
scheduling model and evaluate it using case studies and mathematical modelling.
We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and
its queue management system for coping with the load distribution and for
supporting bulk job scheduling. We demonstrate that such a system is beneficial
for dynamic, distributed and self-organizing resource management and can assist
in optimizing load or job distribution in complex Grid infrastructures.Comment: 8 pages, 9 figures. Presented at the 2nd IEEE Int Conference on
eScience & Grid Computing. Amsterdam Netherlands, December 200
DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling
The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in heavily used Grid infrastructures. We propose a peer-topeer scheduling model and evaluate it using case studies and mathematical modelling. We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and its queue management system for coping with the load distribution and for supporting bulk job scheduling. We demonstrate that such a system is beneficial for dynamic, distributed and self-organizing resource management and can assist in optimizing load or job distribution in complex Grid infrastructures
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Towards In-Transit Analytics for Industry 4.0
Industry 4.0, or Digital Manufacturing, is a vision of inter-connected
services to facilitate innovation in the manufacturing sector. A fundamental
requirement of innovation is the ability to be able to visualise manufacturing
data, in order to discover new insight for increased competitive advantage.
This article describes the enabling technologies that facilitate In-Transit
Analytics, which is a necessary precursor for Industrial Internet of Things
(IIoT) visualisation.Comment: 8 pages, 10th IEEE International Conference on Internet of Things
(iThings-2017), Exeter, UK, 201
Passive network awareness as a means for improved grid scheduling
Grids enable sharing resources of heterogeneous nature and administration. In such distributed systems, the network is usually taken for granted which is potentially problematic due to the complexity and unpredictability of public networks that typically underlie grids. This article introduces GridMAP, a mechanism for considering the network state for enhancing grid scheduling. Network measurements are collected in a passive manner from a user-centric vantage point. This mechanism has been evaluated on a production e-science grid infrastructure, with results showing the ability of GridMAP to improve grid scheduling with minimal network, computational and deployment overheads
A queueing theory approach to Pareto-optimal bags-of-tasks scheduling on clouds
Cloud hosting services offer computing resources which can
scale along with the needs of users. When access to data is limited by
the network capacity this scalability also becomes limited. To
investigate the impact of this limitation we focus on bags{of{tasks where task
data is stored outside the cloud and has to be transferred across the network before task execution can commence. The existing bags-of-tasks estimation tools are not able to provide accurate estimates in such a case. We introduce a queuing{network inspired model which successfully
models the limited network resources. Based on the Mean{Value Analysis of this model we derive an efficient procedure that results with an estimate of the makespan and the executions costs for a given configuration of cloud virtual machines. We compare the calculated Pareto set with measurements performed in a number of experiments for real-world bags-of-tasks and validate the proposed model and the accuracy of the estimated configurations
- …