2 research outputs found

    Scale-Out Algorithm For Apache Storm In SaaS Environment

    Get PDF
    The main appeal of the Cloud is in its cost effective and flexible access to computing power. Apache Storm is a data processing framework used to process streaming data. In our work we explore the possibility of offering Apache Storm as a software service. Further, we take advantage of the cgroups feature in Storm to divide the computing power of worker machine into smaller units to be offered to users. We predict that the compute bounds placed on the cgroups could be used to approximate the state of the workflow. We discuss the limitations of the current schedulers in facilitating this type of approximation as the resources are distributed in arbitrary ways. We implement a new custom scheduler that allows the user with more explicit control over the way resources are distributed to components in the workflow. We further build a simple model to approximate the current state and also predict the future state of the workflow due to changes in resource allocation. We propose a scale-out algorithm to increase the throughput of the workflow. We use the predictive model to measure the effects of many candidate allocations before choosing it. Our approach analyzes the strengths and drawbacks of Stela algorithm and design a complementary algorithm. We show that the combined algorithm complement each others strengths and drawbacks and provides allocations to maximize throughput for much larger set of scenarios. We implement the algorithm as a stand alone scheduler and evaluate the strategy through physical simulation on the Apache Storm Cluster and on software simulations for a set of workflows. Adviser: Ying L

    Cost-efficient enactment of stream processing topologies

    No full text
    The continuous increase of unbound streaming data poses several challenges to established data stream processing engines. One of the most important challenges is the cost-efficient enactment of stream processing topologies under changing data volume. These data volume pose different loads to stream processing systems whose resource provisioning needs to be continuously updated at runtime. First approaches already allow for resource provisioning on the level of virtual machines (VMs), but this only allows for coarse resource provisioning strategies. Based on current advances and benefits for containerized software systems, we have designed a cost-efficient resource provisioning approach and integrated it into the runtime of the Vienna ecosystem for elastic stream processing. Our resource provisioning approach aims to maximize the resource usage for VMs obtained from cloud providers. This strategy only releases processing capabilities at the end of the VMs minimal leasing duration instead of releasing them eagerly as soon as possible as it is the case for threshold-based approaches. This strategy allows us to improve the service level agreement compliance by up to 25% and a reduction for the operational cost of up to 36%