5 research outputs found

    A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds

    Full text link
    [EN] Bag-of-Tasks (BoT) workflows are widespread in many big data analysis fields. However, there are very few cloud resource provisioning and scheduling algorithms tailored for BoT workflows. Furthermore, existing algorithms fail to consider the stochastic task execution times of BoT workflows which leads to deadline violations and increased resource renting costs. In this paper, we propose a dynamic cloud resource provisioning and scheduling algorithm which aims to fulfill the workflow deadline by using the sum of task execution time expectation and standard deviation to estimate real task execution times. A bag-based delay scheduling strategy and a single-type based virtual machine interval renting method are presented to decrease the resource renting cost. The proposed algorithm is evaluated using a cloud simulator ElasticSim which is extended from CloudSim. The results show that the dynamic algorithm decreases the resource renting cost while guaranteeing the workflow deadline compared to the existing algorithms. (C) 2017 Elsevier B.V. All rights reserved.The authors would like to thank the reviewers for their constructive and useful comments. This work is supported by the National Natural Science Foundation of China (Grant No. 61602243 and 61572127), the Natural Science Foundation ofJiangsu Province (Grant No. BK20160846), Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Nanjing University of Science and Technology, Grant No. 30916014107), the Fundamental Research Funds for the Central University (Grant No. 30916015104). Ruben Ruiz is partially supported by the Spanish Ministry of Economy and Competitiveness, under the project "SCHEYARD" (No. DP12015-65895-R) co-financed by FEDER funds.Cai, Z.; Li, X.; Ruiz GarcĂ­a, R.; Li, Q. (2017). A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds. Future Generation Computer Systems. 71:57-72. https://doi.org/10.1016/j.future.2017.01.020S57727

    Resource-Constrained Scheduling of Stochastic Tasks With Unknown Probability Distribution

    Get PDF
    This work introduces scheduling strategies to maximize the expected numberof independent tasks that can be executed on a cloud platform within a given budgetand under a deadline constraint. Task execution times are not known before execution;instead, the only information available to the scheduler is that they obey some (unknown)probability distribution. The scheduler needs to acquire some information before decidingfor a cutting threshold: instead of allowing all tasks to run until completion, one maywant to interrupt long-running tasks at some point. In addition, the cutting thresholdmay be reevaluated as new information is acquired when the execution progresses further.This works presents several strategies to determine a good cutting threshold, and to decidewhen to re-evaluate it. In particular, we use the Kaplan-Meier estimator to account fortasks that are still running when making a decision. The efficiency of our strategies isassessed through an extensive set of simulations with various budget and deadline values,and ranging over 14 probability distributions.Ce travail présente des stratégies d’ordonnancement permettant de maximiser le nombre attendu de tâches indépendantes pouvant être exécutées sur une plateforme de type cloud avec un budget donné et une contrainte de date limite. Le temps d’exécution des tâches est inconnu, on sait seulement qu’ils obéissent à une distribution de probabilité (inconnue). L’ordonnanceur peut décider à tout moment d’interrompre l’exécution d’une tâche (longue) en cours d’exécution et d’en lancer une nouvelle, mais le budget déjà utilisé pour la tâche interrompue est perdu. Le seuil d’interruption d’une tâche peut être recalculé au fur et à mesure que l’exécution progresse globalement. Ce travail présente plusieurs stratégies pour déterminer un bon seuil d’interruption, et pour décider quand le ré-évaluer. Nous utilisons l’estimateur de Kaplan-Meier pour prendre en compte les tâches en cours d’exécution au moment où la décision est prise. L’efficacité de nos stratégies est évaluée via un vaste ensemble de simulations, avec diverses valeurs de budget et de date limite, et portant sur 14 distributions de probabilit

    Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center

    Full text link
    [EN] Total profit is one of the most important factors to be considered from the perspective of resource providers. In this paper, an original MapReduce workflow scheduling with deadline and data locality is proposed to maximize total profit of resource providers. A new workflow conversion based on dynamic programming and ChainMap/ChainReduce is designed to decrease transmission times among MapReduce jobs of workflows. A new deadline division considering execution time, float time and job level is proposed to obtain better deadlines of MapReduce jobs in workflows. With the adapted replica strategy in MapReduce workflow, a new task scheduling is proposed to improve data locality which assigns tasks to servers with the earliest completion time in order to ensure resource providers obtain more profit. Experimental results show that the proposed heuristic results in larger total profit than other adopted algorithms.This work is supported by the National Key Research and Development Program of China (No. 2017YFB1400801), the National Natural Science Foundation of China (Nos. 61872077, 61832004) and Collaborative Innovation Center of Wireless Communications Technology. Rubén Ruiz is partly supported by the Spanish Ministry of Science, Innovation, and Universities, under the project ¿OPTEP-Port Terminal Operations Optimization¿ (No. RTI2018-094940-B-I00) financed with FEDER funds¿.Wang, J.; Li, X.; Ruiz García, R.; Xu, H.; Chu, D. (2020). Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center. Service Oriented Computing and Applications. 14(2):101-118. https://doi.org/10.1007/s11761-020-00290-1S101118142Zaharia M, Chowdhury M, Franklin M et al (2010) Spark: cluster computing with working sets. In: Usenix conference on hot topics in cloud computing, pp 1765–1773Li L, Ma Z, Liu L et al (2013) Hadoop-based ARIMA algorithm and its application in weather forecast. Int J Database Theory Appl 6(5):119–132Xun Y, Zhang J, Qin X (2017) FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans Syst Man Cybern Syst 46(3):313–325Wang Y, Shi W (2014) Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds. IEEE Trans Cloud Comput 2(3):306–319Tiwari N, Sarkar S, Bellur U et al (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):1–49Bu Y, Howe B, Balazinska M et al (2012) The HaLoop approach to large-scale iterative data analysis. VLDB J 21(2):169–190Gunarathne T, Zhang B, Wu T et al (2013) Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Gener Comput Syst 29(4):1035–1048Zhang Y, Gao Q, Gao L et al (2012) iMapReduce: a distributed computing framework for iterative computation. J Grid Comput 10(1):47–68Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: International conference on parallel and distributed systems, pp 9–16Tang Z, Zhou J, Li K et al (2013) A MapReduce task scheduling algorithm for deadline constraints. Clust Comput 16(4):651–662Zhang W, Rajasekaran S, Wood T et al (2014) MIMP: deadline and interference aware scheduling of Hadoop virtual machines. In: International symposium on cluster, cloud and grid computing, pp 394–403Teng F, Magoulès F, Yu L et al (2014) A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud. J Supercomput 69(2):739–765Palanisamy B, Singh A, Liu L (2015) Cost-effective resource provisioning for MapReduce in a cloud. IEEE Trans Parallel Distrib Syst 26(5):1265–1279Hashem I, Anuar N, Marjani M et al (2018) Multi-objective scheduling of MapReduce jobs in big data processing. Multimed Tools Appl 77(8):9979–9994Xu X, Tang M, Tian Y (2017) QoS-guaranteed resource provisioning for cloud-based MapReduce in dynamical environments. Future Gener Comput Syst 78(1):18–30Li H, Wei X, Fu Q et al (2014) MapReduce delay scheduling with deadline constraint. Concurr Comput Pract Exp 26(3):766–778Polo J, Becerra Y, Carrera D et al (2013) Deadline-based MapReduce workload management. IEEE Trans Netw Serv Manag 10(2):231–244Chen C, Lin J, Kuo S (2018) MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans Cloud Comput 6(1):127–140Kao Y, Chen Y (2016) Data-locality-aware MapReduce real-time scheduling framework. J Syst Softw 112:65–77Bok K, Hwang J, Lim J et al (2017) An efficient MapReduce scheduling scheme for processing large multimedia data. Multimed Tools Appl 76(16):1–24Chen Y, Borthakur D, Borthakur D et al (2012) Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: ACM european conference on computer systems, pp 43–56Mashayekhy L, Nejad M, Grosu D et al (2015) Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733Lei H, Zhang T, Liu Y et al (2015) SGEESS: smart green energy-efficient scheduling strategy with dynamic electricity price for data center. J Syst Softw 108:23–38Oliveira D, Ocana K, Baiao F et al (2012) A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J Grid Comput 10(3):521–552Li S, Hu S, Abdelzaher T (2015) The packing server for real-time scheduling of MapReduce workflows. In: IEEE real-time and embedded technology and applications symposium, pp 51–62Cai Z, Li X, Ruiz R et al (2017) A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds. Future Gener Comput Syst 71:57–72Cai Z, Li X, Ruiz R (2017) Resource provisioning for task-batch based workflows with deadlines in public clouds. IEEE Trans Cloud Comput. https://doi.org/10.1109/TCC.2017.2663426Cai Z, Li X, Gupta J (2016) Heuristics for provisioning services to workflows in XaaS clouds. IEEE Trans Serv Comput 9(2):250–263Li X, Cai Z (2017) Elastic resource provisioning for cloud workflow applications. IEEE Trans Autom Sci Eng 14(2):1195–1210Tang Z, Liu M, Ammar A et al (2014) An optimized MapReduce workflow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):1–21Xu C, Yang J, Yin K et al (2017) Optimal construction of virtual networks for cloud-based MapReduce workflows. Comput Netw 112:194–207Chiara S, Danilo A, Gianpaolo C et al (2013) Optimizing service selection and allocation in situational computing applications. IEEE Trans Serv Comput 6(3):414–428Baresi L, Elisabetta D, Carlo G et al (2007) A framework for the deployment of adaptable web service compositions. Serv Oriented Comput Appl 1(1):75–91Lim H, Herodotou H, Babu S (2012) Stubby: a transformation-based optimizer for MapReduce workflows. VLDB Endow 5(11):1196–1207Ke H, Li P, Guo S et al (2016) On traffic-aware partition and aggregation in MapReduce for big data applications. IEEE Trans Parallel Distrib Syst 27(3):818–828Yu W, Wang Y, Que X et al (2015) Virtual shuffling for efficient data movement in MapReduce. IEEE Trans Comput 64(2):556–568Chowdhury M, Zaharia M, Ma J et al (2011) Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput Commun 41(4):98–109Guo D, Xie J, Zhou X et al (2015) Exploiting efficient and scalable shuffle transfers in future data center network. IEEE Trans Parallel Distrib Syst 26(4):997–1009Li D, Yu Y, He W et al (2015) Willow: saving data center network energy for network-limited flows. IEEE Trans Parallel Distrib Syst 26(9):2610–2620Tan J, Meng X, Zhang L (2013) Coupling task progress for MapReduce resource-aware scheduling. In: IEEE INFOCOM, pp 1618–1626Hammoud M, Rehman M, Sakr M (2012) Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In: International conference on cloud computing, pp 49–58Guo Z, Fox G, Zhou M et al (2012) Improving resource utilization in MapReduce. In: International conference on cluster computing, pp 402–410Fischer M, Su X, Yin Y (2010) Assigning tasks for efficiency in Hadoop. In: Proceedings of the 22nd ACM symposium on parallelism in algorithms and architectures, pp 30–39Zhu Y, Jiang Y, Wu W et al (2014) Minimizing makespan and total completion time in MapReduce-like systems. In: IEEE INFOCOM, pp 2166–2174Kavulya S, Tan J, Gandhi R et al (2010) An analysis of traces from a production MapReduce cluster. In: IEEE/ACM international conference on cluster, cloud and grid computing, pp 94–103Abrishami S, Naghibzadeh M, Epema D (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service clouds. Future Gener Comput Syst 29(1):158–169Fernando B, Edmundo R (2010) Towards the scheduling of multiple workflows on computational grids. J Grid Comput 8(3):419–441Tiwari N, Sarkar S, Bellur U et al (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):1–38Verma A, Cherkasova L, Campbell R (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Dependable Secur Comput 10(5):314–327Heintz B, Chandra A, Sitaraman R et al (2017) End-to-end optimization for geo-distributed MapReduce. IEEE Trans Cloud Comput 4(3):293–306Chen L, Li X (2018) Cloud workflow scheduling with hybrid resource provisioning. J Supercomput 74(12):6529–6553Li X, Jiang T, Ruiz R (2016) Heuristics for periodical batch job scheduling in a MapReduce computing framework. Inf Sci 326:119–133Vanhoucheabcd M, Maenhout B, Tavares L (2008) An evaluation of the adequacy of project network generators with systematically sampled networks. Eur J Oper Res 187(2):511–52

    SHARING WITH LIVE MIGRATION ENERGY OPTIMIZATION TASK SCHEDULER FOR CLOUD COMPUTING DATACENTRES

    Get PDF
    The use of cloud computing is expanding, and it is becoming the driver for innovation in all companies to serve their customers around the world. A big attention was drawn to the huge energy that was consumed within those datacentres recently neglecting the energy consumption in the rest of the cloud components. Therefore, the energy consumption should be reduced to minimize performance losses, achieve the target battery lifetime, satisfy performance requirements, minimize power consumption, minimize the CO2 emissions, maximize the profit, and maximize resource utilization. Reducing power consumption in the cloud computing datacentres can be achieved by many ways such as managing or utilizing the resources, controlling redundancy, relocating datacentres, improvement of applications or dynamic voltage and frequency scaling. One of the most efficient ways to reduce power is to use a scheduling technique that will find the best task execution order based on the users demands and with the minimum execution time and cloud resources. It is quite a challenge in cloud environment to design an effective and an efficient task scheduling technique which is done based on the user requirements. The scheduling process is not an easy task because within the datacentre there is dissimilar hardware with different capacities and, to improve the resource utilization, an efficient scheduling algorithm must be applied on the incoming tasks to achieve efficient computing resource allocating and power optimization. The scheduler must maintain the balance between the Quality of Service and fairness among the jobs so that the efficiency may be increased. The aim of this project is to propose a novel method for optimizing energy usage in cloud computing environments that satisfy the Quality of Service (QoS) and the regulations of the Service Level Agreement (SLA). Applying a power- and resource-optimised scheduling algorithm will assist to control and improve the process of mapping between the datacentre servers and the incoming tasks and achieve the optimal deployment of the data centre resources to achieve good computing efficiency, network load minimization and reducing the energy consumption in the datacentre. This thesis explores cloud computing energy aware datacentre structures with diverse scheduling heuristics and propose a novel job scheduling technique with sharing and live migration based on file locality (SLM) aiming to maximize efficiency and save power consumed in the datacentre due to bandwidth usage utilization, minimizing the processing time and the system total make span. The propose SLM energy efficient scheduling strategy have four basic algorithms: 1) Job Classifier, 2) SLM job scheduler, 3) Dual fold VM virtualization and 4) VM threshold margins and consolidation. The SLM job classifier worked on categorising the incoming set of user requests to the datacentre in to two different queues based on these requests type and the source file needed to process them. The processing time of each job fluctuate based on the job type and the number of instructions for each job. The second algorithm, which is the SLM scheduler algorithm, dispatch jobs from both queues according to job arrival time and control the allocation process to the most appropriate and available VM based on job similarity according to a predefined synchronized job characteristic table (SJC). The SLM scheduler uses a replicated host’s infrastructure to save the wasted idle hosts energy by maximizing the basic host’s utilization as long as the system can deal with workflow while setting replicated hosts on off mode. The third SLM algorithm, the dual fold VM algorithm, divide the active VMs in to a top and low level slots to allocate similar jobs concurrently which maximize the host utilization at high workload and reduce the total make span. The VM threshold margins and consolidation algorithm set an upper and lower threshold margin as a trigger for VMs consolidation and load balancing process among running VMs, and deploy a continuous provisioning of overload and underutilize VMs detection scheme to maintain and control the system workload balance. The consolidation and load balancing is achieved by performing a series of dynamic live migrations which provides auto-scaling for the servers with in the datacentres. This thesis begins with cloud computing overview then preview the conceptual cloud resources management strategies with classification of scheduling heuristics. Following this, a Competitive analysis of energy efficient scheduling algorithms and related work is presented. The novel SLM algorithm is proposed and evaluated using the CloudSim toolkit under number of scenarios, then the result compared to Particle Swarm Optimization algorithm (PSO) and Ant Colony Algorithm (ACO) shows a significant improvement in the energy usage readings levels and total make span time which is the total time needed to finish processing all the tasks
    corecore