3,202 research outputs found
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies
The current landscape of scientific research is widely based on modeling and
simulation, typically with complexity in the simulation's flow of execution and
parameterization properties. Execution flows are not necessarily
straightforward since they may need multiple processing tasks and iterations.
Furthermore, parameter and performance studies are common approaches used to
characterize a simulation, often requiring traversal of a large parameter
space. High-performance computers offer practical resources at the expense of
users handling the setup, submission, and management of jobs. This work
presents the design of PaPaS, a portable, lightweight, and generic workflow
framework for conducting parallel parameter and performance studies. Workflows
are defined using parameter files based on keyword-value pairs syntax, thus
removing from the user the overhead of creating complex scripts to manage the
workflow. A parameter set consists of any combination of environment variables,
files, partial file contents, and command line arguments. PaPaS is being
developed in Python 3 with support for distributed parallelization using SSH,
batch systems, and C++ MPI. The PaPaS framework will run as user processes, and
can be used in single/multi-node and multi-tenant computing systems. An example
simulation using the BehaviorSpace tool from NetLogo and a matrix multiply
using OpenMP are presented as parameter and performance studies, respectively.
The results demonstrate that the PaPaS framework offers a simple method for
defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
Scheduling in Grid Computing Environment
Scheduling in Grid computing has been active area of research since its
beginning. However, beginners find very difficult to understand related
concepts due to a large learning curve of Grid computing. Thus, there is a need
of concise understanding of scheduling in Grid computing area. This paper
strives to present concise understanding of scheduling and related
understanding of Grid computing system. The paper describes overall picture of
Grid computing and discusses important sub-systems that enable Grid computing
possible. Moreover, the paper also discusses concepts of resource scheduling
and application scheduling and also presents classification of scheduling
algorithms. Furthermore, the paper also presents methodology used for
evaluating scheduling algorithms including both real system and simulation
based approaches. The presented work on scheduling in Grid containing concise
understandings of scheduling system, scheduling algorithm, and scheduling
methodology would be very useful to users and researchersComment: Fourth International Conference on Advanced Computing & Communication
Technologies (ACCT), 201
EPOBF: Energy Efficient Allocation of Virtual Machines in High Performance Computing Cloud
Cloud computing has become more popular in provision of computing resources
under virtual machine (VM) abstraction for high performance computing (HPC)
users to run their applications. A HPC cloud is such cloud computing
environment. One of challenges of energy efficient resource allocation for VMs
in HPC cloud is tradeoff between minimizing total energy consumption of
physical machines (PMs) and satisfying Quality of Service (e.g. performance).
On one hand, cloud providers want to maximize their profit by reducing the
power cost (e.g. using the smallest number of running PMs). On the other hand,
cloud customers (users) want highest performance for their applications. In
this paper, we focus on the scenario that scheduler does not know global
information about user jobs and user applications in the future. Users will
request shortterm resources at fixed start times and non interrupted durations.
We then propose a new allocation heuristic (named Energy-aware and Performance
per watt oriented Bestfit (EPOBF)) that uses metric of performance per watt to
choose which most energy-efficient PM for mapping each VM (e.g. maximum of MIPS
per Watt). Using information from Feitelson's Parallel Workload Archive to
model HPC jobs, we compare the proposed EPOBF to state of the art heuristics on
heterogeneous PMs (each PM has multicore CPU). Simulations show that the EPOBF
can reduce significant total energy consumption in comparison with state of the
art allocation heuristics.Comment: 10 pages, in Procedings of International Conference on Advanced
Computing and Applications, Journal of Science and Technology, Vietnamese
Academy of Science and Technology, ISSN 0866-708X, Vol. 51, No. 4B, 201
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Scalable dimensioning of resilient Lambda Grids
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Hybrid Meta-heuristic Algorithms for Static and Dynamic Job Scheduling in Grid Computing
The term ’grid computing’ is used to describe an infrastructure that connects geographically
distributed computers and heterogeneous platforms owned by multiple organizations
allowing their computational power, storage capabilities and other resources to be selected
and shared. Allocating jobs to computational grid resources in an efficient manner is one
of the main challenges facing any grid computing system; this allocation is called job
scheduling in grid computing. This thesis studies the application of hybrid meta-heuristics
to the job scheduling problem in grid computing, which is recognized as being one of
the most important and challenging issues in grid computing environments. Similar to
job scheduling in traditional computing systems, this allocation is known to be an NPhard
problem. Meta-heuristic approaches such as the Genetic Algorithm (GA), Variable
Neighbourhood Search (VNS) and Ant Colony Optimisation (ACO) have all proven their
effectiveness in solving different scheduling problems. However, hybridising two or more
meta-heuristics shows better performance than applying a stand-alone approach. The new
high level meta-heuristic will inherit the best features of the hybridised algorithms, increasing
the chances of skipping away from local minima, and hence enhancing the overall
performance. In this thesis, the application of VNS for the job scheduling problem in grid
computing is introduced. Four new neighbourhood structures, together with a modified
local search, are proposed. The proposed VNS is hybridised using two meta-heuristic
methods, namely GA and ACO, in loosely and strongly coupled fashions, yielding four
new sequential hybrid meta-heuristic algorithms for the problem of static and dynamic
single-objective independent batch job scheduling in grid computing. For the static version
of the problem, several experiments were carried out to analyse the performance of the
proposed schedulers in terms of minimising the makespan using well known benchmarks.
The experiments show that the proposed schedulers achieved impressive results compared
to other traditional, heuristic and meta-heuristic approaches selected from the bibliography.
To model the dynamic version of the problem, a simple simulator, which uses
the rescheduling technique, is designed and new problem instances are generated, by
using a well-known methodology, to evaluate the performance of the proposed hybrid
schedulers. The experimental results show that the use of rescheduling provides significant
improvements in terms of the makespan compared to other non-rescheduling approaches
Recommended from our members
Personal mobile grids with a honeybee inspired resource scheduler
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The overall aim of the thesis has been to introduce Personal Mobile Grids (PMGrids)
as a novel paradigm in grid computing that scales grid infrastructures to mobile devices and extends grid entities to individual personal users. In this thesis, architectural designs as well as simulation models for PM-Grids are developed.
The core of any grid system is its resource scheduler. However, virtually all current conventional grid schedulers do not address the non-clairvoyant scheduling problem, where job information is not available before the end of execution. Therefore, this thesis proposes a honeybee inspired resource scheduling heuristic for PM-Grids (HoPe) incorporating a radical approach to grid resource scheduling to tackle this problem. A detailed design and implementation of HoPe with a decentralised self-management and adaptive policy are initiated.
Among the other main contributions are a comprehensive taxonomy of grid systems as well as a detailed analysis of the honeybee colony and its nectar acquisition process (NAP), from the resource scheduling perspective, which have not been presented in any previous work, to the best of our knowledge.
PM-Grid designs and HoPe implementation were evaluated thoroughly through a strictly controlled empirical evaluation framework with a well-established heuristic in high throughput computing, the opportunistic scheduling heuristic (OSH), as a benchmark algorithm. Comparisons with optimal values and worst bounds are conducted to gain a clear insight into HoPe behaviour, in terms of stability, throughput, turnaround time and speedup, under different running conditions of number of jobs and grid scales.
Experimental results demonstrate the superiority of HoPe performance where it
has successfully maintained optimum stability and throughput in more than 95%
of the experiments, with HoPe achieving three times better than the OSH under
extremely heavy loads. Regarding the turnaround time and speedup, HoPe has
effectively achieved less than 50% of the turnaround time incurred by the OSH, while doubling its speedup in more than 60% of the experiments.
These results indicate the potential of both PM-Grids and HoPe in realising futuristic grid visions. Therefore considering the deployment of PM-Grids in real life scenarios and the utilisation of HoPe in other parallel processing and high throughput computing systems are recommended
Metaheuristic Based Scheduling Meta-Tasks in Distributed Heterogeneous Computing Systems
Scheduling is a key problem in distributed heterogeneous computing systems in order to benefit from the large computing capacity of such systems and is an NP-complete problem. In this paper, we present a metaheuristic technique, namely the Particle Swarm Optimization (PSO) algorithm, for this problem. PSO is a population-based search algorithm based on the simulation of the social behavior of bird flocking and fish schooling. Particles fly in problem search space to find optimal or near-optimal solutions. The scheduler aims at minimizing makespan, which is the time when finishes the latest task. Experimental studies show that the proposed method is more efficient and surpasses those of reported PSO and GA approaches for this problem
- …