5,385 research outputs found

    Autonomous resource-aware scheduling of large-scale media workflows

    Get PDF
    The media processing and distribution industry generally requires considerable resources to be able to execute the various tasks and workflows that constitute their business processes. The latter processes are often tied to critical constraints such as strict deadlines. A key issue herein is how to efficiently use the available computational, storage and network resources to be able to cope with the high work load. Optimizing resource usage is not only vital to scalability, but also to the level of QoS (e.g. responsiveness or prioritization) that can be provided. We designed an autonomous platform for scheduling and workflow-to-resource assignment, taking into account the different requirements and constraints. This paper presents the workflow scheduling algorithms, which consider the state and characteristics of the resources (computational, network and storage). The performance of these algorithms is presented in detail in the context of a European media processing and distribution use-case

    Comparison of the Execution Step Strategies on Scheduling Data-intensive Workflows on IaaS Cloud Platforms

    Get PDF
    The IaaS platforms of the Cloud hold promise for executing parallel applications, particularly data-intensive scientific workflows. An important challenge for users of these platforms executing scientific workflows is to strike the right trade-off between the execution time of the scientific workflow and the cost of using the platform. In a previous article, we proposed an efficient approach that assists the user in finding this compromise. This approach requires an algorithm aimed at minimizing the execution time of the workflow once the platform configuration is set. In this article, we compare two different strategies for executing a workflow after its offline scheduling using an algorithm. The algorithm that we proposed in the previous study has outperform the HEFT algorithm. The first strategy allows some ready tasks to execute earlier than other higher-priority tasks that are ready later due to data transfer times. This strategy is justified by the fact that although our scheduling algorithm attempts to minimize data transfers between tasks running on different virtual machines, this algorithm does not include data transfer times in the planned execution dates for the various tasks of the workflow. The second strategy strictly adheres to the predetermined order among tasks scheduled on the same virtual machine. The results of our evaluations show that the best execution strategy depends on the characteristics of the workflow. For each evaluated workflow, our results demonstrate that our scheduling algorithm combined with the best execution strategy surpasses HEFT. The choice of the best strategy must be determined experimentally following realistic simulations, such as the ones we conduct here using the WRENCH framework, before conducting simulations to find the best compromise between cost and execution time of a workflow on an IaaS platform.

    Deadline-Budget constrained Scheduling Algorithm for Scientific Workflows in a Cloud Environment

    Get PDF
    Recently cloud computing has gained popularity among e-Science environments as a high performance computing platform. From the viewpoint of the system, applications can be submitted by users at any moment in time and with distinct QoS requirements. To achieve higher rates of successful applications attending to their QoS demands, an effective resource allocation (scheduling) strategy between workflow\u27s tasks and available resources is required. Several algorithms have been proposed for QoS workflow scheduling, but most of them use search-based strategies that generally have a higher time complexity, making them less useful in realistic scenarios. In this paper, we present a heuristic scheduling algorithm with quadratic time complexity that considers two important constraints for QoS-based workflow scheduling, time and cost, named Deadline-Budget Workflow Scheduling (DBWS) for cloud environments. Performance evaluation of some well-known scientific workflows shows that the DBWS algorithm accomplishes both constraints with higher success rate in comparison to the current state-of-the-art heuristic-based approaches

    Data Replication and Its Alignment with Fault Management in the Cloud Environment

    Get PDF
    Nowadays, the exponential data growth becomes one of the major challenges all over the world. It may cause a series of negative impacts such as network overloading, high system complexity, and inadequate data security, etc. Cloud computing is developed to construct a novel paradigm to alleviate massive data processing challenges with its on-demand services and distributed architecture. Data replication has been proposed to strategically distribute the data access load to multiple cloud data centres by creating multiple data copies at multiple cloud data centres. A replica-applied cloud environment not only achieves a decrease in response time, an increase in data availability, and more balanced resource load but also protects the cloud environment against the upcoming faults. The reactive fault tolerance strategy is also required to handle the faults when the faults already occurred. As a result, the data replication strategies should be aligned with the reactive fault tolerance strategies to achieve a complete management chain in the cloud environment. In this thesis, a data replication and fault management framework is proposed to establish a decentralised overarching management to the cloud environment. Three data replication strategies are firstly proposed based on this framework. A replica creation strategy is proposed to reduce the total cost by jointly considering the data dependency and the access frequency in the replica creation decision making process. Besides, a cloud map oriented and cost efficiency driven replica creation strategy is proposed to achieve the optimal cost reduction per replica in the cloud environment. The local data relationship and the remote data relationship are further analysed by creating two novel data dependency types, Within-DataCentre Data Dependency and Between-DataCentre Data Dependency, according to the data location. Furthermore, a network performance based replica selection strategy is proposed to avoid potential network overloading problems and to increase the number of concurrent-running instances at the same time
    • …
    corecore