1 research outputs found

    Scheduling techniques for efficient execution of stream workflows in cloud environments

    Get PDF
    Advancements in Internet of Things (IoT) technology have led to the development of advanced applications and services that rely on data generated from enormous amounts of connected devices such as sensors, mobile devices and smart cars. These applications process and analyse such data as it arrives to unleash the potential of live analytics. Considering that our future world will be fully automated, current IoT applications and services are categorised as data-driven workflows, which integrate multiple analytical components. Examples of these workflow applications are smart farming, smart retail and smart transportation. This work flow application also known as a stream work flow is one type of big data workflow application and is becoming gradually viable for solving real-time data computation problems that are more complex. The use of cloud computing technology which can provide on demand and elastic resources to execute stream workflow applications is ideal, but additional challenges are raised due to the location of data sources and end users' requirements in terms of data processing and deadline for decision making. The focus of existing research works in this domain is on the streaming operator graph generated by streaming data platforms, where this graph differs from a stream workflow as there is a single source of data for the whole operator graph and one end operator, while stream workflow has multiple input data sources and multiple output streams. Moreover, the majority of those works investigated one type of runtime change for the streaming graph operator, which is the fluctuation of data. This means that the structural changes that may happen at runtime are not studied. Considering the heterogeneity and dynamic behaviour of stream workflows, these workflow applications have unique features that make the scheduling problem have different assumptions and optimisation goals compared with the placement problem of streaming graph operators. As a consequence, the execution of stream workflow applications on the cloud environment requires advanced scheduling techniques to address the aforementioned challenges as well as handling different runtime changes that may occur during the execution of these applications. To this end, the Multicloud environment approach opens the door toward enhancing the execution of workflow applications by leveraging various clouds to utilise data locality and exploit deployment flexibility. Thus, the problem of scheduling a stream workflow in a Multicloud environment while meeting user real-time data analysis requirements needs to be investigated. In this thesis, we leverage the Multicloud environment approach to design novel scheduling techniques to efficiently schedule outsourcing stream workflow applications over various cloud infrastructures while minimising the execution cost. We also design dynamic scheduling techniques to continuously manage resources to handle structural and non-structural changes at runtime in order to maintain user-defined performance requirements at minimal execution cost. In summary, this thesis makes the following concrete contributions: • Comprehensive state of the art survey that analyses various big data workflow orchestration issues span over three different levels (workflow, data and cloud) by providing a research taxonomy of core requirements, challenges, and current tools, techniques and research prototypes. • Simulation toolkit named IoTSim-Stream to model and simulate stream workflow applications in cloud computing environments. • Two scheduling algorithms that generate scheduling plans at deployment time to execute stream workflow efficiently on cloud infrastructures with minimal monetary cost. • Two-phase adaptive scheduling technique that considers the problem of scheduling stream workflows to support runtime data fluctuations while guaranteeing real-time performance requirements and minimising monetary cost. • Pluggable dynamic scheduling technique that manages cloud resources over time to handle structural changes of stream workflow at runtime in a cost-effective manner, along with three plugin scheduling methods
    corecore