7,425 research outputs found

    Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

    Get PDF
    As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM

    Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

    Full text link
    The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

    Model-driven Scheduling for Distributed Stream Processing Systems

    Full text link
    Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

    Performance-oriented Cloud Provisioning: Taxonomy and Survey

    Full text link
    Cloud computing is being viewed as the technology of today and the future. Through this paradigm, the customers gain access to shared computing resources located in remote data centers that are hosted by cloud providers (CP). This technology allows for provisioning of various resources such as virtual machines (VM), physical machines, processors, memory, network, storage and software as per the needs of customers. Application providers (AP), who are customers of the CP, deploy applications on the cloud infrastructure and then these applications are used by the end-users. To meet the fluctuating application workload demands, dynamic provisioning is essential and this article provides a detailed literature survey of dynamic provisioning within cloud systems with focus on application performance. The well-known types of provisioning and the associated problems are clearly and pictorially explained and the provisioning terminology is clarified. A very detailed and general cloud provisioning classification is presented, which views provisioning from different perspectives, aiding in understanding the process inside-out. Cloud dynamic provisioning is explained by considering resources, stakeholders, techniques, technologies, algorithms, problems, goals and more.Comment: 14 pages, 3 figures, 3 table

    Effectiveness of segment routing technology in reducing the bandwidth and cloud resources provisioning times in network function virtualization architectures

    Get PDF
    Network Function Virtualization is a new technology allowing for a elastic cloud and bandwidth resource allocation. The technology requires an orchestrator whose role is the service and resource orchestration. It receives service requests, each one characterized by a Service Function Chain, which is a set of service functions to be executed according to a given order. It implements an algorithm for deciding where both to allocate the cloud and bandwidth resources and to route the SFCs. In a traditional orchestration algorithm, the orchestrator has a detailed knowledge of the cloud and network infrastructures and that can lead to high computational complexity of the SFC Routing and Cloud and Bandwidth resource Allocation (SRCBA) algorithm. In this paper, we propose and evaluate the effectiveness of a scalable orchestration architecture inherited by the one proposed within the European Telecommunications Standards Institute (ETSI) and based on the functional separation of an NFV orchestrator in Resource Orchestrator (RO) and Network Service Orchestrator (NSO). Each cloud domain is equipped with an RO whose task is to provide a simple and abstract representation of the cloud infrastructure. These representations are notified of the NSO that can apply a simplified and less complex SRCBA algorithm. In addition, we show how the segment routing technology can help to simplify the SFC routing by means of an effective addressing of the service functions. The scalable orchestration solution has been investigated and compared to the one of a traditional orchestrator in some network scenarios and varying the number of cloud domains. We have verified that the execution time of the SRCBA algorithm can be drastically reduced without degrading the performance in terms of cloud and bandwidth resource costs
    corecore