Search CORE

78 research outputs found

Probabilistic grid scheduling based on job statistics and monitoring information

Author: Lazarevic A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2005
Field of study

This transfer thesis presents a novel, probabilistic approach to scheduling applications on computational Grids based on their historical behaviour, current state of the Grid and predictions of the future execution times and resource utilisation of such applications. The work lays a foundation for enabling a more intuitive, user-friendly and effective scheduling technique termed deadline scheduling. Initial work has established motivation and requirements for a more efficient Grid scheduler, able to adaptively handle dynamic nature of the Grid resources and submitted workload. Preliminary scheduler research identified the need for a detailed monitoring of Grid resources on the process level, and for a tool to simulate non-deterministic behaviour and statistical properties of Grid applications. A simulation tool, GridLoader, has been developed to enable modelling of application loads similar to a number of typical Grid applications. GridLoader is able to simulate CPU utilisation, memory allocation and network transfers according to limits set through command line parameters or a configuration file. Its specific strength is in achieving set resource utilisation targets in a probabilistic manner, thus creating a dynamic environment, suitable for testing the scheduler’s adaptability and its prediction algorithm. To enable highly granular monitoring of Grid applications, a monitoring framework based on the Ganglia Toolkit was developed and tested. The suite is able to collect resource usage information of individual Grid applications, integrate it into standard XML based information flow, provide visualisation through a Web portal, and export data into a format suitable for off-line analysis. The thesis also presents initial investigation of the utilisation of University College London Central Computing Cluster facility running Sun Grid Engine middleware. Feasibility of basic prediction concepts based on the historical information and process meta-data have been successfully established and possible scheduling improvements using such predictions identified. The thesis is structured as follows: Section 1 introduces Grid computing and its major concepts; Section 2 presents open research issues and specific focus of the author’s research; Section 3 gives a survey of the related literature, schedulers, monitoring tools and simulation packages; Section 4 presents the platform for author’s work – the Self-Organising Grid Resource management project; Sections 5 and 6 give detailed accounts of the monitoring framework and simulation tool developed; Section 7 presents the initial data analysis while Section 8.4 concludes the thesis with appendices and references

UCL Discovery

A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data

Author: Abelém Antônio Jorge Gomes
Azevedo Vasco
Bol Erick
Cerdeira Louise Teixeira
Lima Jakelyne
Schneider Maria Paula Cruz
Silva Artur
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2012
Field of study

Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

BIGhybrid: A Simulator for MapReduce Applications in Hybrid Distributed Infrastructures Validated with the Grid5000 Experimental Platform

Author: Anjos Julio,
Fedak Gilles
Geyer Claudio
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

International audienceSUMMARY Cloud computing has increasingly been used as a platform for running large business and data processing applications. Conversely, Desktop Grids have been successfully employed in a wide range of projects, because they are able to take advantage of a large number of resources provided free of charge by volunteers. A hybrid infrastructure created from the combination of Cloud and Desktop Grids infrastructures can provide a low-cost and scalable solution for Big Data analysis. Although frameworks like MapReduce have been designed to exploit commodity hardware, their ability to take advantage of a hybrid infrastructure poses significant challenges due to their large resource heterogeneity and high churn rate. In this paper is proposed BIGhybrid, a simulator for two existing classes of MapReduce runtime environments: BitDew-MapReduce designed for Desktop Grids and BlobSeer-Hadoop designed for Cloud computing, where the goal is to carry out accurate simulations of MapReduce executions in a hybrid infrastructure composed of Cloud computing and Desktop Grid resources. This work describes the principles of the simulator and describes the validation of BigHybrid with the Grid5000 experimental platform. Owing to BigHybrid, developers can investigate and evaluate new algorithms to enable MapReduce to be executed in hybrid infrastructures. This includes topics such as resource allocation and data splitting. Concurrency and Computation: Practice and Experienc

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Static Scheduling Strategies for Heterogeneous Systems

Author: Beaumont Olivier
Legrand Arnaud
Robert Yves
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/02/2012
Field of study

In this paper, we consider static scheduling techniques for heterogeneous systems, such as clusters and grids. We successively deal with minimum makespan scheduling, divisible load scheduling and steady-state scheduling. Finally, we discuss the limitations of static scheduling approaches

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Grid Infrastructure for Satellite Data Processing in Ukraine

Author: Ilin Mykola
Korbakov Mykhailo
Kravchenko Oleksii
Kussul Nataliia
Pasechnik Volodymyr
Rudakova Alina
Shelestov Andrii
Skakun Serhiy
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2008
Field of study

In this paper conceptual foundations for the development of Grid systems that aimed for satellite data processing are discussed. The state of the art of development of such Grid systems is analyzed, and a model of Grid system for satellite data processing is proposed. An experience obtained within the development of the Grid system for satellite data processing in the Space Research Institute of NASU-NSAU is discussed

Bulgarian Digital Mathematics Library at IMI-BAS

Handling Very Large Platforms with the New SimGrid Platform Description Formalism

Author: Frincu Marc-Eduard
Quinson Martin
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

Simulation of parallel and distributed applications is a very active research field as simulation allows for repeatable results, makes it possible to explore various platform scenarios at will, is not as labor-intensive or as costly as running experiments on a real platform, and often makes it possible to run enormous numbers of experiments quickly. If many simulation toolkits exist, most of them face the issue of the description of the large scale platforms that are currently deployed. In this technical report we focus on the platform description format used by the SimGrid toolkit and present how we modify its DTD to tackle the scaling issues induced by large platforms. Experiments show a reduction by a 6,600 factor of the size of the XML file for a multi-cluster platform comprising 1,300 hosts. We also extend the existing DTD in order to integrate the description of other features like attaching arbitrary properties to resources or introducing randomness in the platform description and express the dynamic nature of a platform directly in the XML file

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Energy-Efficient Management of Data Center Resources for Cloud Computing: A Vision, Architectural Elements, and Open Challenges

Author: Abawajy Jemal
Beloglazov Anton
Buyya Rajkumar
Publication venue
Publication date: 01/01/2010
Field of study

Cloud computing is offering utility-oriented IT services to users worldwide. Based on a pay-as-you-go model, it enables hosting of pervasive applications from consumer, scientific, and business domains. However, data centers hosting Cloud applications consume huge amounts of energy, contributing to high operational costs and carbon footprints to the environment. Therefore, we need Green Cloud computing solutions that can not only save energy for the environment but also reduce operational costs. This paper presents vision, challenges, and architectural elements for energy-efficient management of Cloud computing environments. We focus on the development of dynamic resource provisioning and allocation algorithms that consider the synergy between various data center infrastructures (i.e., the hardware, power units, cooling and software), and holistically work to boost data center energy efficiency and performance. In particular, this paper proposes (a) architectural principles for energy-efficient management of Clouds; (b) energy-efficient resource allocation policies and scheduling algorithms considering quality-of-service expectations, and devices power usage characteristics; and (c) a novel software technology for energy-efficient management of Clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.Comment: 12 pages, 5 figures,Proceedings of the 2010 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2010), Las Vegas, USA, July 12-15, 201

arXiv.org e-Print Archive

CiteSeerX

Deakin Research Online

Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

Author: Legrand Arnaud
Publication venue: HAL CCSD
Publication date: 02/11/2015
Field of study

Although our everyday life and society now depends heavily oncommunication infrastructures and computation infrastructures,scientists and engineers have always been among the main consumers ofcomputing power. This document provides a coherent overview of theresearch I have conducted in the last 15 years and which targets themanagement and performance evaluation of large scale distributedcomputing infrastructures such as clusters, grids, desktop grids,volunteer computing platforms, ... when used for scientific computing.In the first part of this document, I present how I have addressedscheduling problems arising on distributed platforms (like computinggrids) with a particular emphasis on heterogeneity and multi-userissues, hence in connection with game theory. Most of these problemsare relaxed from a classical combinatorial optimization formulationinto a continuous form, which allows to easily account for keyplatform characteristics such as heterogeneity or complex topologywhile providing efficient practical and distributed solutions.The second part presents my main contributions to the SimGrid project,which is a simulation toolkit for building simulators of distributedapplications (originally designed for scheduling algorithm evaluationpurposes). It comprises a unified presentation of how the questions ofvalidation and scalability have been addressed in SimGrid as well asthoughts on specific challenges related to methodological aspects andto the application of SimGrid to the HPC context

Thèses en Ligne

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

QoS based workflow scheduling on heterogeneous resources

Author: Hamid Arabnejad
Publication venue
Publication date: 29/04/2016
Field of study

Repositório Aberto da Universidade do Porto