Search CORE

896 research outputs found

Multi-criteria and satisfaction oriented scheduling for hybrid distributed computing infrastructures

Author: Fedak Gilles
Litan Cristian
Moca Mircea
Silaghi Gheorghe,
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

International audienceAssembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Advances in Grid Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

Directory of Open Access Books (DOAB)

A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

Author: Hoseinyfarahabady Mohammadreza
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

Sydney eScholarship

Recommended from our members

Model-based resource management for fine-grained services

Author: Gias Alim Ul
Publication venue: Computing, Imperial College London
Publication date: 01/08/2022
Field of study

The emergence of DevOps has changed the way modern distributed software systems are developed. Architectures decomposed in fine-grained services, such as microservices or function-as-a-service (FaaS), are now widespread across many organizations. From a resource management perspective, although the systems built with such architectures have many benefits, there are still research challenges that need further attention. In this study, we have focused on three such challenges, each concerning a specific system resource: compute, memory, or storage. Firstly, we focus on scaling the capacity of microservices at runtime. Here, the challenge is to design an autoscaler that can decide between vertical and horizontal scaling options to distribute the CPU capacity. Secondly, we focus on estimating the required capacity of an on-premises FaaS platform such that the service level agreements (SLAs) for function response times are satisfied. The challenge here is to address the cold start dilemma, i.e., that a cold start delays a function response but reduces the memory consumption. Thus, we must find a limit of cold starts such that the memory-consumption remains in-check while satisfying the SLAs. Finally, we focus on the storage management for distributed tracing targeted at microservices. The volume of such traces generated in a data center can be in the scale of tens of terabytes per day, but only a small fraction of these traces is useful for troubleshooting. The objective then is to sample only the useful traces. The key to addressing all these challenges is first, modeling the dynamics concerning the resources and subsequently, leveraging the model in a resource controller. To address the first challenge, we have developed an autoscaler ATOM that leverages layered queueing network (LQN) models to take its scaling decisions. Our experiment, with a real-life application, shows that ATOM produces 30-37% better results than the baseline autoscalers. For the second challenge, we have developed COCOA, a cold start aware capacity planner. COCOA utilizes M/M/k setup and LQN models to assess the cold start scenario and estimate the required capacity. We show with simulation that COCOA can reduce over-provisioning by over 70% compared to the availability aware approaches. Finally, addressing the third challenge, we propose SampleHST, a trace sampler that works under a storage budget constraint. SampleHST relies on either bag of words or graph-based models to represent a trace and groups similar traces using online clustering to perform sampling. We have evaluated the performance of SampleHST using data from both literature and production, which shows it produces 1.2x to 19x better results than the state-of-the-art.Open Acces

City Research Online

Spiral - Imperial College Digital Repository

Energy and performance-aware scheduling and shut-down models for efficient cloud-computing data centers.

Author: Fernández Cerero Damián
Publication venue
Publication date: 10/12/2018
Field of study

This Doctoral Dissertation, presented as a set of research contributions, focuses on resource efficiency in data centers. This topic has been faced mainly by the development of several energy-efficiency, resource managing and scheduling policies, as well as the simulation tools required to test them in realistic cloud computing environments. Several models have been implemented in order to minimize energy consumption in Cloud Computing environments. Among them: a) Fifteen probabilistic and deterministic energy-policies which shut-down idle machines; b) Five energy-aware scheduling algorithms, including several genetic algorithm models; c) A Stackelberg game-based strategy which models the concurrency between opposite requirements of Cloud-Computing systems in order to dynamically apply the most optimal scheduling algorithms and energy-efficiency policies depending on the environment; and d) A productive analysis on the resource efficiency of several realistic cloud–computing environments. A novel simulation tool called SCORE, able to simulate several data-center sizes, machine heterogeneity, security levels, workload composition and patterns, scheduling strategies and energy-efficiency strategies, was developed in order to test these strategies in large-scale cloud-computing clusters. As results, more than fifty Key Performance Indicators (KPI) show that more than 20% of energy consumption can be reduced in realistic high-utilization environments when proper policies are employed.Esta Tesis Doctoral, que se presenta como compendio de artículos de investigación, se centra en la eficiencia en la utilización de los recursos en centros de datos de internet. Este problema ha sido abordado esencialmente desarrollando diferentes estrategias de eficiencia energética, gestión y distribución de recursos, así como todas las herramientas de simulación y análisis necesarias para su validación en entornos realistas de Cloud Computing. Numerosas estrategias han sido desarrolladas para minimizar el consumo energético en entornos de Cloud Computing. Entre ellos: 1. Quince políticas de eficiencia energética, tanto probabilísticas como deterministas, que apagan máquinas en estado de espera siempre que sea posible; 2. Cinco algoritmos de distribución de tareas que tienen en cuenta el consumo energético, incluyendo varios modelos de algoritmos genéticos; 3. Una estrategia basada en la teoría de juegos de Stackelberg que modela la competición entre diferentes partes de los centros de datos que tienen objetivos encontrados. Este modelo aplica dinámicamente las estrategias de distribución de tareas y las políticas de eficiencia energética dependiendo de las características del entorno; y 4. Un análisis productivo sobre la eficiencia en la utilización de recursos en numerosos escenarios de Cloud Computing. Una nueva herramienta de simulación llamada SCORE se ha desarrollado para analizar las estrategias antes mencionadas en clústers de Cloud Computing de grandes dimensiones. Los resultados obtenidos muestran que se puede conseguir un ahorro de energía superior al 20% en entornos realistas de alta utilización si se emplean las estrategias de eficiencia energética adecuadas. SCORE es open source y puede simular diferentes centros de datos con, entre otros muchos, los siguientes parámetros: Tamaño del centro de datos; heterogeneidad de los servidores; tipo, composición y patrones de carga de trabajo, estrategias de distribución de tareas y políticas de eficiencia energética, así como tres gestores de recursos centralizados: Monolítico, Two-level y Shared-state. Como resultados, esta herramienta de simulación arroja más de 50 Key Performance Indicators (KPI) de rendimiento general, de distribucin de tareas y de energía.Premio Extraordinario de Doctorado U

idUS. Depósito de Investigación Universidad de Sevilla

Application Agreement and Integration Services

Author: Driscoll Kevin R.
Hall Brendan
Schweiker Kevin
Publication venue
Publication date
Field of study

Application agreement and integration services are required by distributed, fault-tolerant, safety critical systems to assure required performance. An analysis of distributed and hierarchical agreement strategies are developed against the backdrop of observed agreement failures in fielded systems. The documented work was performed under NASA Task Order NNL10AB32T, Validation And Verification of Safety-Critical Integrated Distributed Systems Area 2. This document is intended to satisfy the requirements for deliverable 5.2.11 under Task 4.2.2.3. This report discusses the challenges of maintaining application agreement and integration services. A literature search is presented that documents previous work in the area of replica determinism. Sources of non-deterministic behavior are identified and examples are presented where system level agreement failed to be achieved. We then explore how TTEthernet services can be extended to supply some interesting application agreement frameworks. This document assumes that the reader is familiar with the TTEthernet protocol. The reader is advised to read the TTEthernet protocol standard [1] before reading this document. This document does not re-iterate the content of the standard

NASA Technical Reports Server

SHARING WITH LIVE MIGRATION ENERGY OPTIMIZATION TASK SCHEDULER FOR CLOUD COMPUTING DATACENTRES

Author: Alshathri Samah
Publication venue: 'University of Plymouth'
Publication date: 01/01/2020
Field of study

The use of cloud computing is expanding, and it is becoming the driver for innovation in all companies to serve their customers around the world. A big attention was drawn to the huge energy that was consumed within those datacentres recently neglecting the energy consumption in the rest of the cloud components. Therefore, the energy consumption should be reduced to minimize performance losses, achieve the target battery lifetime, satisfy performance requirements, minimize power consumption, minimize the CO2 emissions, maximize the profit, and maximize resource utilization. Reducing power consumption in the cloud computing datacentres can be achieved by many ways such as managing or utilizing the resources, controlling redundancy, relocating datacentres, improvement of applications or dynamic voltage and frequency scaling. One of the most efficient ways to reduce power is to use a scheduling technique that will find the best task execution order based on the users demands and with the minimum execution time and cloud resources. It is quite a challenge in cloud environment to design an effective and an efficient task scheduling technique which is done based on the user requirements. The scheduling process is not an easy task because within the datacentre there is dissimilar hardware with different capacities and, to improve the resource utilization, an efficient scheduling algorithm must be applied on the incoming tasks to achieve efficient computing resource allocating and power optimization. The scheduler must maintain the balance between the Quality of Service and fairness among the jobs so that the efficiency may be increased. The aim of this project is to propose a novel method for optimizing energy usage in cloud computing environments that satisfy the Quality of Service (QoS) and the regulations of the Service Level Agreement (SLA). Applying a power- and resource-optimised scheduling algorithm will assist to control and improve the process of mapping between the datacentre servers and the incoming tasks and achieve the optimal deployment of the data centre resources to achieve good computing efficiency, network load minimization and reducing the energy consumption in the datacentre. This thesis explores cloud computing energy aware datacentre structures with diverse scheduling heuristics and propose a novel job scheduling technique with sharing and live migration based on file locality (SLM) aiming to maximize efficiency and save power consumed in the datacentre due to bandwidth usage utilization, minimizing the processing time and the system total make span. The propose SLM energy efficient scheduling strategy have four basic algorithms: 1) Job Classifier, 2) SLM job scheduler, 3) Dual fold VM virtualization and 4) VM threshold margins and consolidation. The SLM job classifier worked on categorising the incoming set of user requests to the datacentre in to two different queues based on these requests type and the source file needed to process them. The processing time of each job fluctuate based on the job type and the number of instructions for each job. The second algorithm, which is the SLM scheduler algorithm, dispatch jobs from both queues according to job arrival time and control the allocation process to the most appropriate and available VM based on job similarity according to a predefined synchronized job characteristic table (SJC). The SLM scheduler uses a replicated host’s infrastructure to save the wasted idle hosts energy by maximizing the basic host’s utilization as long as the system can deal with workflow while setting replicated hosts on off mode. The third SLM algorithm, the dual fold VM algorithm, divide the active VMs in to a top and low level slots to allocate similar jobs concurrently which maximize the host utilization at high workload and reduce the total make span. The VM threshold margins and consolidation algorithm set an upper and lower threshold margin as a trigger for VMs consolidation and load balancing process among running VMs, and deploy a continuous provisioning of overload and underutilize VMs detection scheme to maintain and control the system workload balance. The consolidation and load balancing is achieved by performing a series of dynamic live migrations which provides auto-scaling for the servers with in the datacentres. This thesis begins with cloud computing overview then preview the conceptual cloud resources management strategies with classification of scheduling heuristics. Following this, a Competitive analysis of energy efficient scheduling algorithms and related work is presented. The novel SLM algorithm is proposed and evaluated using the CloudSim toolkit under number of scenarios, then the result compared to Particle Swarm Optimization algorithm (PSO) and Ant Colony Algorithm (ACO) shows a significant improvement in the energy usage readings levels and total make span time which is the total time needed to finish processing all the tasks

Plymouth Electronic Archive and Research Library

FIT4Green - Energy aware ICT Optimization Policies

Author: Basmadjian Robert
Bunse Christian
Georgiadou Vasiliki
Giuliani Giovanni
Klingert Sonja
Lovasz Gergo
Majanen Mikko
Publication venue: 'Airiti Press, Inc.'
Publication date: 01/01/2010
Field of study

MAnnheim DOCument Server

VTT Research System

Contributions to High-Throughput Computing Based on the Peer-to-Peer Paradigm

Author: Pérez Miguel Carlos
Publication venue
Publication date: 18/06/2015
Field of study

XII, 116 p.This dissertation focuses on High Throughput Computing (HTC) systems and how to build a working HTC system using Peer-to-Peer (P2P) technologies. The traditional HTC systems, designed to process the largest possible number of tasks per unit of time, revolve around a central node that implements a queue used to store and manage submitted tasks. This central node limits the scalability and fault tolerance of the HTC system. A usual solution involves the utilization of replicas of the master node that can replace it. This solution is, however, limited by the number of replicas used. In this thesis, we propose an alternative solution that follows the P2P philosophy: a completely distributed system in which all worker nodes participate in the scheduling tasks, and with a physically distributed task queue implemented on top of a P2P storage system. The fault tolerance and scalability of this proposal is, therefore, limited only by the number of nodes in the system. The proper operation and scalability of our proposal have been validated through experimentation with a real system. The data availability provided by Cassandra, the P2P data management framework used in our proposal, is analysed by means of several stochastic models. These models can be used to make predictions about the availability of any Cassandra deployment, as well as to select the best possible con guration of any Cassandra system. In order to validate the proposed models, an experimentation with real Cassandra clusters is made, showing that our models are good descriptors of Cassandra's availability. Finally, we propose a set of scheduling policies that try to solve a common problem of HTC systems: re-execution of tasks due to a failure in the node where the task was running, without additional resource misspending. In order to reduce the number of re-executions, our proposals try to nd good ts between the reliability of nodes and the estimated length of each task. An extensive simulation-based experimentation shows that our policies are capable of reducing the number of re-executions, improving system performance and utilization of nodes

Archivo Digital para la Docencia y la Investigación