Search CORE

3,202 research outputs found

HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

Author: Buyya Rajkumar
Calheiros Rodrigo N.
Cunha Renato L. F.
Netto Marco A. S.
Rodrigues Eduardo R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

arXiv.org e-Print Archive

Western Sydney ResearchDirect

PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

Author: Day Judy
Lenhart Suzanne
Peterson Gregory D.
Ponce Eduardo
Stephenson Brittany
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/07/2018
Field of study

The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

arXiv.org e-Print Archive

Crossref

Scheduling in Grid Computing Environment

Author: Prajapati Harshadkumar B.
Shah Vipul A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/07/2014
Field of study

Scheduling in Grid computing has been active area of research since its beginning. However, beginners find very difficult to understand related concepts due to a large learning curve of Grid computing. Thus, there is a need of concise understanding of scheduling in Grid computing area. This paper strives to present concise understanding of scheduling and related understanding of Grid computing system. The paper describes overall picture of Grid computing and discusses important sub-systems that enable Grid computing possible. Moreover, the paper also discusses concepts of resource scheduling and application scheduling and also presents classification of scheduling algorithms. Furthermore, the paper also presents methodology used for evaluating scheduling algorithms including both real system and simulation based approaches. The presented work on scheduling in Grid containing concise understandings of scheduling system, scheduling algorithm, and scheduling methodology would be very useful to users and researchersComment: Fourth International Conference on Advanced Computing & Communication Technologies (ACCT), 201

arXiv.org e-Print Archive

Crossref

EPOBF: Energy Efficient Allocation of Virtual Machines in High Performance Computing Cloud

Author: A Beloglazov
A Beloglazov
A Beloglazov
I Takouna
R Buyya
RN Calheiros
S Albers
V Mauch
Publication venue
Publication date: 14/09/2014
Field of study

Cloud computing has become more popular in provision of computing resources under virtual machine (VM) abstraction for high performance computing (HPC) users to run their applications. A HPC cloud is such cloud computing environment. One of challenges of energy efficient resource allocation for VMs in HPC cloud is tradeoff between minimizing total energy consumption of physical machines (PMs) and satisfying Quality of Service (e.g. performance). On one hand, cloud providers want to maximize their profit by reducing the power cost (e.g. using the smallest number of running PMs). On the other hand, cloud customers (users) want highest performance for their applications. In this paper, we focus on the scenario that scheduler does not know global information about user jobs and user applications in the future. Users will request shortterm resources at fixed start times and non interrupted durations. We then propose a new allocation heuristic (named Energy-aware and Performance per watt oriented Bestfit (EPOBF)) that uses metric of performance per watt to choose which most energy-efficient PM for mapping each VM (e.g. maximum of MIPS per Watt). Using information from Feitelson's Parallel Workload Archive to model HPC jobs, we compare the proposed EPOBF to state of the art heuristics on heterogeneous PMs (each PM has multicore CPU). Simulations show that the EPOBF can reduce significant total energy consumption in comparison with state of the art allocation heuristics.Comment: 10 pages, in Procedings of International Conference on Advanced Computing and Applications, Journal of Science and Technology, Vietnamese Academy of Science and Technology, ISSN 0866-708X, Vol. 51, No. 4B, 201

arXiv.org e-Print Archive

Crossref

Learning Scheduling Algorithms for Data Processing Clusters

Author: Abadi Martín
Addanki Ravichandra
Dai Hanjun
Finn Chelsea
Ghodsi Ali
Gog Ionel
Grandl Robert
Greensmith Evan
Hindman Benjamin
Kingma Diederik P
Mao Hongzi
Mao Hongzi
Marcus Ryan
Mirhoseini Azalia
Mirhoseini Azalia
Pinto Lerrel
Schulman John
Spark Apache
Sutton S.
Weaver Lex
Zaharia Matei
Publication venue
Publication date: 21/08/2019
Field of study

Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Scalable dimensioning of resilient Lambda Grids

Author: De Leenheer Marc
De Turck Filip
Demeester Piet
Dhoedt Bart
Thysebaert Pieter
Volckaert Bruno
Publication venue
Publication date: 01/01/2007
Field of study

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit

CiteSeerX

Ghent University Academic Bibliography

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

Hybrid Meta-heuristic Algorithms for Static and Dynamic Job Scheduling in Grid Computing

Author: Younis Muhanad Tahrir
Publication venue: Faculty of Technology
Publication date: 01/09/2018
Field of study

The term ’grid computing’ is used to describe an infrastructure that connects geographically distributed computers and heterogeneous platforms owned by multiple organizations allowing their computational power, storage capabilities and other resources to be selected and shared. Allocating jobs to computational grid resources in an efficient manner is one of the main challenges facing any grid computing system; this allocation is called job scheduling in grid computing. This thesis studies the application of hybrid meta-heuristics to the job scheduling problem in grid computing, which is recognized as being one of the most important and challenging issues in grid computing environments. Similar to job scheduling in traditional computing systems, this allocation is known to be an NPhard problem. Meta-heuristic approaches such as the Genetic Algorithm (GA), Variable Neighbourhood Search (VNS) and Ant Colony Optimisation (ACO) have all proven their effectiveness in solving different scheduling problems. However, hybridising two or more meta-heuristics shows better performance than applying a stand-alone approach. The new high level meta-heuristic will inherit the best features of the hybridised algorithms, increasing the chances of skipping away from local minima, and hence enhancing the overall performance. In this thesis, the application of VNS for the job scheduling problem in grid computing is introduced. Four new neighbourhood structures, together with a modified local search, are proposed. The proposed VNS is hybridised using two meta-heuristic methods, namely GA and ACO, in loosely and strongly coupled fashions, yielding four new sequential hybrid meta-heuristic algorithms for the problem of static and dynamic single-objective independent batch job scheduling in grid computing. For the static version of the problem, several experiments were carried out to analyse the performance of the proposed schedulers in terms of minimising the makespan using well known benchmarks. The experiments show that the proposed schedulers achieved impressive results compared to other traditional, heuristic and meta-heuristic approaches selected from the bibliography. To model the dynamic version of the problem, a simple simulator, which uses the rescheduling technique, is designed and new problem instances are generated, by using a well-known methodology, to evaluate the performance of the proposed hybrid schedulers. The experimental results show that the use of rescheduling provides significant improvements in terms of the makespan compared to other non-rescheduling approaches

De Montfort University Open Research Archive

Recommended from our members

Personal mobile grids with a honeybee inspired resource scheduler

Author: Kurdi Heba Abdullataif
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The overall aim of the thesis has been to introduce Personal Mobile Grids (PMGrids) as a novel paradigm in grid computing that scales grid infrastructures to mobile devices and extends grid entities to individual personal users. In this thesis, architectural designs as well as simulation models for PM-Grids are developed. The core of any grid system is its resource scheduler. However, virtually all current conventional grid schedulers do not address the non-clairvoyant scheduling problem, where job information is not available before the end of execution. Therefore, this thesis proposes a honeybee inspired resource scheduling heuristic for PM-Grids (HoPe) incorporating a radical approach to grid resource scheduling to tackle this problem. A detailed design and implementation of HoPe with a decentralised self-management and adaptive policy are initiated. Among the other main contributions are a comprehensive taxonomy of grid systems as well as a detailed analysis of the honeybee colony and its nectar acquisition process (NAP), from the resource scheduling perspective, which have not been presented in any previous work, to the best of our knowledge. PM-Grid designs and HoPe implementation were evaluated thoroughly through a strictly controlled empirical evaluation framework with a well-established heuristic in high throughput computing, the opportunistic scheduling heuristic (OSH), as a benchmark algorithm. Comparisons with optimal values and worst bounds are conducted to gain a clear insight into HoPe behaviour, in terms of stability, throughput, turnaround time and speedup, under different running conditions of number of jobs and grid scales. Experimental results demonstrate the superiority of HoPe performance where it has successfully maintained optimum stability and throughput in more than 95% of the experiments, with HoPe achieving three times better than the OSH under extremely heavy loads. Regarding the turnaround time and speedup, HoPe has effectively achieved less than 50% of the turnaround time incurred by the OSH, while doubling its speedup in more than 60% of the experiments. These results indicate the potential of both PM-Grids and HoPe in realising futuristic grid visions. Therefore considering the deployment of PM-Grids in real life scenarios and the utilisation of HoPe in other parallel processing and high throughput computing systems are recommended

Brunel University Research Archive

Metaheuristic Based Scheduling Meta-Tasks in Distributed Heterogeneous Computing Systems

Author: Abraham
Ajith Abraham
Braun
Fernandez-Baca
Foster
Goldberg
Hesam Izakian
Kirkpatrick
Macheswaran
Page
Salman
Václav Snášel
Xhafa
Xhafa
Xhafa
Xhafa
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2009
Field of study

Scheduling is a key problem in distributed heterogeneous computing systems in order to benefit from the large computing capacity of such systems and is an NP-complete problem. In this paper, we present a metaheuristic technique, namely the Particle Swarm Optimization (PSO) algorithm, for this problem. PSO is a population-based search algorithm based on the simulation of the social behavior of bird flocking and fish schooling. Particles fly in problem search space to find optimal or near-optimal solutions. The scheduler aims at minimizing makespan, which is the time when finishes the latest task. Experimental studies show that the proposed method is more efficient and surpasses those of reported PSO and GA approaches for this problem

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

DSpace at VSB Technical University of Ostrava

NORA - Norwegian Open Research Archives