383 research outputs found

    A Reliable and Cost-Efficient Auto-Scaling System for Web Applications Using Heterogeneous Spot Instances

    Full text link
    Cloud providers sell their idle capacity on markets through an auction-like mechanism to increase their return on investment. The instances sold in this way are called spot instances. In spite that spot instances are usually 90% cheaper than on-demand instances, they can be terminated by provider when their bidding prices are lower than market prices. Thus, they are largely used to provision fault-tolerant applications only. In this paper, we explore how to utilize spot instances to provision web applications, which are usually considered availability-critical. The idea is to take advantage of differences in price among various types of spot instances to reach both high availability and significant cost saving. We first propose a fault-tolerant model for web applications provisioned by spot instances. Based on that, we devise novel auto-scaling polices for hourly billed cloud markets. We implemented the proposed model and policies both on a simulation testbed for repeatable validation and Amazon EC2. The experiments on the simulation testbed and the real platform against the benchmarks show that the proposed approach can greatly reduce resource cost and still achieve satisfactory Quality of Service (QoS) in terms of response time and availability

    Rolling Window Time Series Prediction Using MapReduce

    Get PDF
    Prediction of time series data is an important application in many domains. Despite their inherent advantages, traditional databases and MapReduce methodology are not ideally suited for this type of processing due to dependencies introduced by the sequential nature of time series. In this thesis a novel framework is presented to facilitate retrieval and rolling window prediction of irregularly sampled large-scale time series data. By introducing a new index pool data structure, processing of time series can be efficiently parallelised. The proposed framework is implemented in R programming environment and utilises Hadoop to support parallelisation and fault tolerance. A systematic multi-predictor selection model is designed and applied, in order to choose the best-fit algorithm for different circumstances. Additionally, the boosting method is deployed as a post-processing to further optimise the predictive results. Experimental results on a cloud-based platform indicate that the proposed framework scales linearly up to 32-nodes, and performs efficiently with a relatively optimised prediction

    Open Source Big Data Platforms and Tools: An Analysis

    Get PDF
    Big data is attracting an excessive amount of interest in the IT and academic sectors. On a regular basis, computer and digital industries generate more data than they have space to store. In the current situation, five billion people have their own mobile phone, and over two billion people are linked globally to exchange various types of data. By 2020, it is estimated that about fifty billion people will be connected to the internet. During2020, data generation, use, and sharing would be forty-four times higher than in previous years. A variety of sectors and organizations are using big data to manage various operations. As a result, a thorough examination of big data's benefits, drawbacks, meaning, and characteristics is needed. The primary goal of this research is to gather information on the various open-source big data tools and platforms that are used by various organizations. In this paper we use a three perspective methodology to identify the strength and weaknesses of the workflow in a open source big data arena. This helps to establish a pipeline of workflow events for both researcher and entrepreneur decision making

    Feedback-Based Resource Allocation in MapReduce-Based Systems

    Get PDF

    An efficient cloud scheduler design supporting preemptible instances

    Full text link
    Maximizing resource utilization by performing an efficient resource provisioning is a key factor for any cloud provider: commercial actors can maximize their revenues, whereas scientific and non-commercial providers can maximize their infrastructure utilization. Traditionally, batch systems have allowed data centers to fill their resources as much as possible by using backfilling and similar techniques. However, in an IaaS cloud, where virtual machines are supposed to live indefinitely, or at least as long as the user is able to pay for them, these policies are not easily implementable. In this work we present a new scheduling algorithm for IaaS providers that is able to support preemptible instances, that can be stopped by higher priority requests without introducing large modifications in the current cloud schedulers. This scheduler enables the implementation of new cloud usage and payment models that allow more efficient usage of the resources and potential new revenue sources for commercial providers. We also study the correctness and the performace overhead of the proposed scheduler agains existing solutions

    Adaptive Big Data Pipeline

    Get PDF
    Over the past three decades, data has exponentially evolved from being a simple software by-product to one of the most important companies’ assets used to understand their customers and foresee trends. Deep learning has demonstrated that big volumes of clean data generally provide more flexibility and accuracy when modeling a phenomenon. However, handling ever-increasing data volumes entail new challenges: the lack of expertise to select the appropriate big data tools for the processing pipelines, as well as the speed at which engineers can take such pipelines into production reliably, leveraging the cloud. We introduce a system called Adaptive Big Data Pipelines: a platform to automate data pipelines creation. It provides an interface to capture the data sources, transformations, destinations and execution schedule. The system builds up the cloud infrastructure, schedules and fine-tunes the transformations, and creates the data lineage graph. This system has been tested on data sets of 50 gigabytes, processing them in just a few minutes without user intervention.ITESO, A. C

    Resource Provisioning Exploiting Cost and Performance Diversity within IaaS Cloud Providers

    Get PDF
    IaaS platforms such as Amazon EC2 allow clients access to massive computational power in the form of instances. Amazon hosts three different instance purchasing options, each with its own SLA covering pricing and availability. Amazon also offers access to a number of geographical regions, zones, and instance types to select from. In this thesis, the problem of utilizing Spot and On-Demand instances is analyzed and two approaches are presented in order to exploit the cost and performance diversity among different instance types and availability zones, and among the Spot markets they represent. We first develop RAMP, a framework designed to calculate the expected profit of using a specific Spot or On-Demand instance through an evaluation of instance reliability. RAMP is extended to develop RAMC-DC, a framework designed to allocate the most cost effective instance through strategies that facilitate interchangeability of instances among short jobs, reliability of instances among long jobs, and a comparison of the estimated costs of possible allocations. RAMC-DC achieves fault tolerance through comparisons of the price dynamics across instance types and availability zones, and through an examination of three basic checkpointing methods. Evaluations demonstrate that both frameworks take a large step toward low-volatility, high cost-efficiency resource provisioning. While achieving early-termination rates as low as 2.2%, RAMP can completely offset the total cost when charging the user just 17.5% of the On-Demand price. Moreover, the increases in profit resulting from relatively small additional charges to users are notably high, i.e., 100% profit compared to the resource provisioning cost with 35% of the equivalent On-Demand price. RAMC-DC can maintain deadline breaches below 1.8% of all jobs, achieve both early-termination and deadline breach rates as low as 0.5% of all jobs, and lowers total costs by between 80% and 87% compared to using only On-Demand instances
    corecore