7 research outputs found

    Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey

    Get PDF
    In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers

    Performance optimization and energy efficiency of big-data computing workflows

    Get PDF
    Next-generation e-science is producing colossal amounts of data, now frequently termed as Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data-intensive workflows comprised of moldable parallel computing jobs, such as MapReduce, with intricate inter-job dependencies. The granularity of task partitioning in each moldable job of such big data workflows has a significant impact on workflow completion time, energy consumption, and financial cost if executed in clouds, which remains largely unexplored. This dissertation conducts an in-depth investigation into the properties of moldable jobs and provides an experiment-based validation of the performance model where the total workload of a moldable job increases along with the degree of parallelism. Furthermore, this dissertation conducts rigorous research on workflow execution dynamics in resource sharing environments and explores the interactions between workflow mapping and task scheduling on various computing platforms. A workflow optimization architecture is developed to seamlessly integrate three interrelated technical components, i.e., resource allocation, job mapping, and task scheduling. Cloud computing provides a cost-effective computing platform for big data workflows where moldable parallel computing models are widely applied to meet stringent performance requirements. Based on the moldable parallel computing performance model, a big-data workflow mapping model is constructed and a workflow mapping problem is formulated to minimize workflow makespan under a budget constraint in public clouds. This dissertation shows this problem to be strongly NP-complete and designs i) a fully polynomial-time approximation scheme for a special case with a pipeline-structured workflow executed on virtual machines of a single class, and ii) a heuristic for a generalized problem with an arbitrary directed acyclic graph-structured workflow executed on virtual machines of multiple classes. The performance superiority of the proposed solution is illustrated by extensive simulation-based results in Hadoop/YARN in comparison with existing workflow mapping models and algorithms. Considering that large-scale workflows for big data analytics have become a main consumer of energy in data centers, this dissertation also delves into the problem of static workflow mapping to minimize the dynamic energy consumption of a workflow request under a deadline constraint in Hadoop clusters, which is shown to be strongly NP-hard. A fully polynomial-time approximation scheme is designed for a special case with a pipeline-structured workflow on a homogeneous cluster and a heuristic is designed for the generalized problem with an arbitrary directed acyclic graph-structured workflow on a heterogeneous cluster. This problem is further extended to a dynamic version with deadline-constrained MapReduce workflows to minimize dynamic energy consumption in Hadoop clusters. This dissertation proposes a semi-dynamic online scheduling algorithm based on adaptive task partitioning to reduce dynamic energy consumption while meeting performance requirements from a global perspective, and also develops corresponding system modules for algorithm implementation in the Hadoop ecosystem. The performance superiority of the proposed solutions in terms of dynamic energy saving and deadline missing rate is illustrated by extensive simulation results in comparison with existing algorithms, and further validated through real-life workflow implementation and experiments using the Oozie workflow engine in Hadoop/YARN systems

    Allocation optimale multicontraintes des workflows aux ressources d’un environnement Cloud Computing

    Get PDF
    Cloud Computing is increasingly recognized as a new way to use on-demand, computing, storage and network services in a transparent and efficient way. In this thesis, we address the problem of workflows scheduling on distributed heterogeneous infrastructure of Cloud Computing. The existing workflows scheduling approaches mainly focus on the bi-objective optimization of the makespan and the cost. In this thesis, we propose news workflows scheduling algorithms based on metaheuristics. Our algorithms are able to handle more than two QoS (Quality of Service) metrics, namely, makespan, cost, reliability, availability and energy in the case of physical resources. In addition, they address several constraints according to the specified requirements in the SLA (Service Level Agreement). Our algorithms have been evaluated by simulations. We used (1) synthetic workflows and real world scientific workflows having different structures, for our applications; and (2) the features of Amazon EC2 services for our Cloud. The obtained results show the effectiveness of our algorithms when dealing multiple QoS metrics. Our algorithms produce one or more solutions which some of them outperform the solution produced by HEFT heuristic over all the QoS considered, including the makespan for which HEFT is supposed to give good results.Le Cloud Computing est de plus en plus reconnu comme une nouvelle façon d'utiliser, à la demande, les services de calcul, de stockage et de réseau d'une manière transparente et efficace. Dans cette thèse, nous abordons le problème d'ordonnancement de workflows sur les infrastructures distribuées hétérogènes du Cloud Computing. Les approches d'ordonnancement de workflows existantes dans le Cloud se concentrent principalement sur l'optimisation biobjectif du makespan et du coût. Dans cette thèse, nous proposons des algorithmes d'ordonnancement de workflows basés sur des métaheuristiques. Nos algorithmes sont capables de gérer plus de deux métriques de QoS (Quality of Service), notamment, le makespan, le coût, la fiabilité, la disponibilité et l'énergie dans le cas de ressources physiques. En outre, ils traitent plusieurs contraintes selon les exigences spécifiées dans le SLA (Service Level Agreement). Nos algorithmes ont été évalués par simulation en utilisant (1) comme applications: des workflows synthétiques et des workflows scientifiques issues du monde réel ayant des structures différentes; (2) et comme ressources Cloud: les caractéristiques des services de Amazon EC2. Les résultats obtenus montrent l'efficacité de nos algorithmes pour le traitement de plusieurs QoS. Nos algorithmes génèrent une ou plusieurs solutions dont certaines surpassent la solution de l'heuristique HEFT sur toutes les QoS considérées, y compris le makespan pour lequel HEFT est censé donner de bons résultats

    Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

    Get PDF
    Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor
    corecore