486 research outputs found

    Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey

    Get PDF
    In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers

    Performance optimization and energy efficiency of big-data computing workflows

    Get PDF
    Next-generation e-science is producing colossal amounts of data, now frequently termed as Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data-intensive workflows comprised of moldable parallel computing jobs, such as MapReduce, with intricate inter-job dependencies. The granularity of task partitioning in each moldable job of such big data workflows has a significant impact on workflow completion time, energy consumption, and financial cost if executed in clouds, which remains largely unexplored. This dissertation conducts an in-depth investigation into the properties of moldable jobs and provides an experiment-based validation of the performance model where the total workload of a moldable job increases along with the degree of parallelism. Furthermore, this dissertation conducts rigorous research on workflow execution dynamics in resource sharing environments and explores the interactions between workflow mapping and task scheduling on various computing platforms. A workflow optimization architecture is developed to seamlessly integrate three interrelated technical components, i.e., resource allocation, job mapping, and task scheduling. Cloud computing provides a cost-effective computing platform for big data workflows where moldable parallel computing models are widely applied to meet stringent performance requirements. Based on the moldable parallel computing performance model, a big-data workflow mapping model is constructed and a workflow mapping problem is formulated to minimize workflow makespan under a budget constraint in public clouds. This dissertation shows this problem to be strongly NP-complete and designs i) a fully polynomial-time approximation scheme for a special case with a pipeline-structured workflow executed on virtual machines of a single class, and ii) a heuristic for a generalized problem with an arbitrary directed acyclic graph-structured workflow executed on virtual machines of multiple classes. The performance superiority of the proposed solution is illustrated by extensive simulation-based results in Hadoop/YARN in comparison with existing workflow mapping models and algorithms. Considering that large-scale workflows for big data analytics have become a main consumer of energy in data centers, this dissertation also delves into the problem of static workflow mapping to minimize the dynamic energy consumption of a workflow request under a deadline constraint in Hadoop clusters, which is shown to be strongly NP-hard. A fully polynomial-time approximation scheme is designed for a special case with a pipeline-structured workflow on a homogeneous cluster and a heuristic is designed for the generalized problem with an arbitrary directed acyclic graph-structured workflow on a heterogeneous cluster. This problem is further extended to a dynamic version with deadline-constrained MapReduce workflows to minimize dynamic energy consumption in Hadoop clusters. This dissertation proposes a semi-dynamic online scheduling algorithm based on adaptive task partitioning to reduce dynamic energy consumption while meeting performance requirements from a global perspective, and also develops corresponding system modules for algorithm implementation in the Hadoop ecosystem. The performance superiority of the proposed solutions in terms of dynamic energy saving and deadline missing rate is illustrated by extensive simulation results in comparison with existing algorithms, and further validated through real-life workflow implementation and experiments using the Oozie workflow engine in Hadoop/YARN systems

    Deadline Constrained Cloud Computing Resources Scheduling through an Ant Colony System Approach

    Get PDF
    Cloud computing resources scheduling is essential for executing workflows in the cloud platform because it relates to both execution time and execution cost. In this paper, we adopt a model that optimizes the execution cost while meeting deadline constraints. In solving this problem, we propose an Improved Ant Colony System (IACS) approach featuring two novel strategies. Firstly, a dynamic heuristic strategy is used to calculate a heuristic value during an evolutionary process by taking the workflow topological structure into consideration. Secondly, a double search strategy is used to initialize the pheromone and calculate the heuristic value according to the execution time at the beginning and to initialize the pheromone and calculate heuristic value according to the execution cost after a feasible solution is found. Therefore, the proposed IACS is adaptive to the search environment and to different objectives. We have conducted extensive experiments based on workflows with different scales and different cloud resources. We compare the result with a particle swarm optimization (PSO) approach and a dynamic objective genetic algorithm (DOGA) approach. Experimental results show that IACS is able to find better solutions with a lower cost than both PSO and DOGA do on various scheduling scales and deadline conditions

    Reliable and efficient webserver management for task scheduling in edge-cloud platform

    Get PDF
    The development in the field of cloud webserver management for the execution of the workflow and meeting the quality-of-service (QoS) prerequisites in a distributed cloud environment has been a challenging task. Though, internet of things (IoT) of work presented for the scheduling of the workflow in a heterogeneous cloud environment. Moreover, the rapid development in the field of cloud computing like edge-cloud computing creates new methods to schedule the workflow in a heterogenous cloud environment to process different tasks like IoT, event-driven applications, and different network applications. The current methods used for workflow scheduling have failed to provide better trade-offs to meet reliable performance with minimal delay. In this paper, a novel web server resource management framework is presented namely the reliable and efficient webserver management (REWM) framework for the edge-cloud environment. The experiment is conducted on complex bioinformatic workflows; the result shows the significant reduction of cost and energy by the proposed REWM in comparison with standard webserver management methodology

    A Budget-constrained Time and Reliability Optimization BAT Algorithm for Scheduling Workflow Applications in Clouds

    Get PDF
    AbstractEffective scheduling is one of the key concerns while executing workflows in the cloud environment. Workflow scheduling in clouds refers to the mapping of workflow tasks to the cloud resources to optimize some objective function. In this paper, we apply a recently developed meta-heuristic method called the BAT algorithm to solve the multi-objective problem of workflow scheduling in clouds that minimizes the execution time and maximizes the reliability by keeping the budget within user specified limit. Comparison of the results is made with basic, randomized, evolutionary algorithm (BREA) that uses greedy approach to allocate resources to the workflow tasks on the basis of low cost, high reliability and improved execution time machines. It is clear from the experimental results that the BAT algorithm performs better than the basic randomized evolutionary algorithm
    • …
    corecore