48 research outputs found

    Cloud Services Brokerage: A Survey and Research Roadmap

    Get PDF
    A Cloud Services Brokerage (CSB) acts as an intermediary between cloud service providers (e.g., Amazon and Google) and cloud service end users, providing a number of value adding services. CSBs as a research topic are in there infancy. The goal of this paper is to provide a concise survey of existing CSB technologies in a variety of areas and highlight a roadmap, which details five future opportunities for research.Comment: Paper published in the 8th IEEE International Conference on Cloud Computing (CLOUD 2015

    A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds

    Full text link
    Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for lower price. Despite the monetary advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any moment. Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper a static scheduling for HPC applications which are composed by independent tasks (bag-of-task) with deadline constraints. However, if a spot VM hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. To this end, our algorithm statically creates two scheduling maps: (i) the first one contains, for each task, its starting time and on which VM (i.e., an available spot or on-demand VM with the current lowest price) the task should execute; (ii) the second one contains, for each task allocated on a VM spot in the first map, its starting time and on which on-demand VM it should be executed to meet the application deadline in order to avoid temporal failures. The latter will be used whenever the hibernation period of a spot VM exceeds a time limit. Performance results from simulation with task execution traces, configuration of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of our scheduling and that it tolerates temporal failures

    Towards auto-scaling in the cloud: online resource allocation techniques

    Get PDF
    Cloud computing provides an easy access to computing resources. Customers can acquire and release resources any time. However, it is not trivial to determine when and how many resources to allocate. Many applications running in the cloud face workload changes that affect their resource demand. The first thought is to plan capacity either for the average load or for the peak load. In the first case there is less cost incurred, but performance will be affected if the peak load occurs. The second case leads to money wastage, since resources will remain underutilized most of the time. Therefore there is a need for a more sophisticated resource provisioning techniques that can automatically scale the application resources according to workload demand and performance constrains. Large cloud providers such as Amazon, Microsoft, RightScale provide auto-scaling services. However, without the proper configuration and testing such services can do more harm than good. In this work I investigate application specific online resource allocation techniques that allow to dynamically adapt to incoming workload, minimize the cost of virtual resources and meet user-specified performance objectives

    Parqua: Online Reconfigurations in Virtual Ring-Based NoSQL Systems

    Get PDF
    The performance of key-value/NoSQL storage systems is highly tied to the choice of (primary) key for the database table. As application (e.g., business) requirements change over time, and in order to fine-tune the performance of the database to the real query workload, system administrators need to change the primary key of the table. The primary key change is a specific example of a broader class of reconfiguration operations that affect a lot of data all at once. In industry deployments of key-value/NoSQL stores, such reconfigurations are known to be a major pain point. We seek to support reconfiguration operations in keyvalue/ NoSQL storage systems in an automated, online, and efficient manner, i.e., without interrupting the serving of incoming reads and writes, and quickly. Our previous work, titled Morphus, tackled the online reconfiguration problem for sharded NoSQL stores like MongoDB. However, Morphus is inapplicable to ring-based key-value/NoSQL systems (like Cassandra, Riak, and Voldemort) because these rely on a virtual ring (and often consistent hashing). This makes the problem more constrained. In this paper we propose a system called Parqua, which imbues ring-based key-value/NoSQL stores with the ability to perform reconfiguration operations in an online and efficient manner. We present the design and implementation of Parqua. We have integrated Parqua into Apache Cassandra. Experiments based on our cluster deployments show that during reconfiguration Parqua maintains high availability, and with a small impact on read and write latencies.NSF CNS 1319527, NSF CNS 1409416, NSF CCF 0964471, and AFOSR/AFRL FA8750-11-2-0084Ope

    A Pattern-Language for Self-Healing Internet-of-Things Systems

    Get PDF
    Internet-of-Things systems are assemblies of highly-distributed and heterogeneous parts that, in orchestration, work to provide valuable services to end-users in many scenarios. These systems depend on the correct operation of sensors, actuators, and third-party services, and the failure of a single one can hinder the proper functioning of the whole system, making error detection and recovery of paramount importance, but often overlooked. By drawing inspiration from other research areas, such as cloud, embedded, and mission-critical systems, we present a set of patterns for self-healing IoT systems. We discuss how their implementation can improve system reliability by providing error detection, error recovery, and health mechanisms maintenance. (c) 2020 ACM
    corecore