Search CORE

383 research outputs found

Recommended from our members

Analyzing Spark Performance on Spot Instances

Author: Tian Jiannan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 27/10/2017
Field of study

Amazon Spot Instances provide inexpensive service for high-performance computing. With spot instances, it is possible to get at most 90% off as discount in costs by bidding spare Amazon Elastic Computer Cloud (Amazon EC2) instances. In exchange for low cost, spot instances bring the reduced reliability onto the computing environment, because this kind of instance could be revoked abruptly by the providers due to supply and demand, and higher-priority customers are first served. To achieve high performance on instances with compromised reliability, Spark is applied to run jobs. In this thesis, a wide set of spark experiments are conducted to study its performance on spot instances. Without stateful replicating, Spark suffers from cascad- ing rollback and is forced to regenerate these states for ad hoc practices repeatedly. Such downside leads to discussion on trade-off between compatible slow checkpointing and regenerating on rollback and inspires us to apply multiple fault tolerance schemes. And Spark is proven to finish a job only with proper revocation rate. To validate and evaluate our work, prototype and simulator are designed and implemented. And based on real history price records, we studied how various checkpoint write frequencies and bid level affect performance. In case study, experiments show that our presented techniques can lead to ~20% shorter completion time and ~25% lower costs than those cases without such techniques. And compared with running jobs on full-price instance, the absolute saving in costs can be ~70%

ScholarWorks@UMass Amherst

A Reliable and Cost-Efficient Auto-Scaling System for Web Applications Using Heterogeneous Spot Instances

Author: Buyya Rajkumar
Calheiros Rodrigo N.
Qu Chenhao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Cloud providers sell their idle capacity on markets through an auction-like mechanism to increase their return on investment. The instances sold in this way are called spot instances. In spite that spot instances are usually 90% cheaper than on-demand instances, they can be terminated by provider when their bidding prices are lower than market prices. Thus, they are largely used to provision fault-tolerant applications only. In this paper, we explore how to utilize spot instances to provision web applications, which are usually considered availability-critical. The idea is to take advantage of differences in price among various types of spot instances to reach both high availability and significant cost saving. We first propose a fault-tolerant model for web applications provisioned by spot instances. Based on that, we devise novel auto-scaling polices for hourly billed cloud markets. We implemented the proposed model and policies both on a simulation testbed for repeatable validation and Amazon EC2. The experiments on the simulation testbed and the real platform against the benchmarks show that the proposed approach can greatly reduce resource cost and still achieve satisfactory Quality of Service (QoS) in terms of response time and availability

arXiv.org e-Print Archive

Western Sydney ResearchDirect

Rolling Window Time Series Prediction Using MapReduce

Author: Li Lei
Publication venue: Faculty of Engineering and Information Technologies, School of Electrical and Information Engineering
Publication date: 01/01/2015
Field of study

Prediction of time series data is an important application in many domains. Despite their inherent advantages, traditional databases and MapReduce methodology are not ideally suited for this type of processing due to dependencies introduced by the sequential nature of time series. In this thesis a novel framework is presented to facilitate retrieval and rolling window prediction of irregularly sampled large-scale time series data. By introducing a new index pool data structure, processing of time series can be efficiently parallelised. The proposed framework is implemented in R programming environment and utilises Hadoop to support parallelisation and fault tolerance. A systematic multi-predictor selection model is designed and applied, in order to choose the best-fit algorithm for different circumstances. Additionally, the boosting method is deployed as a post-processing to further optimise the predictive results. Experimental results on a cloud-based platform indicate that the proposed framework scales linearly up to 32-nodes, and performs efficiently with a relatively optimised prediction

Sydney eScholarship

Open Source Big Data Platforms and Tools: An Analysis

Author: Benlachmi Yassine
Hasnaoui Moulay Lahcen
Publication venue: IAES Indonesia Section
Publication date: 29/09/2021
Field of study

Big data is attracting an excessive amount of interest in the IT and academic sectors. On a regular basis, computer and digital industries generate more data than they have space to store. In the current situation, five billion people have their own mobile phone, and over two billion people are linked globally to exchange various types of data. By 2020, it is estimated that about fifty billion people will be connected to the internet. During2020, data generation, use, and sharing would be forty-four times higher than in previous years. A variety of sectors and organizations are using big data to manage various operations. As a result, a thorough examination of big data's benefits, drawbacks, meaning, and characteristics is needed. The primary goal of this research is to gather information on the various open-source big data tools and platforms that are used by various organizations. In this paper we use a three perspective methodology to identify the strength and weaknesses of the workflow in a open source big data arena. This helps to establish a pipeline of workflow events for both researcher and entrepreneur decision making

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Orchestrating the Deployment of Computations in the Cloud with Conductor

Author: Bhatotia Pramod
Post Ansley
Rodrigues Rodrigo
Wieder Alexander
Publication venue
Publication date: 01/01/2012
Field of study

Edinburgh Research Explorer

MPG.PuRe

Feedback-Based Resource Allocation in MapReduce-Based Systems

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

An efficient cloud scheduler design supporting preemptible instances

Author: Fernández-del-Castillo Enol
García Álvaro López
Plasencia Isabel Campos
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Maximizing resource utilization by performing an efficient resource provisioning is a key factor for any cloud provider: commercial actors can maximize their revenues, whereas scientific and non-commercial providers can maximize their infrastructure utilization. Traditionally, batch systems have allowed data centers to fill their resources as much as possible by using backfilling and similar techniques. However, in an IaaS cloud, where virtual machines are supposed to live indefinitely, or at least as long as the user is able to pay for them, these policies are not easily implementable. In this work we present a new scheduling algorithm for IaaS providers that is able to support preemptible instances, that can be stopped by higher priority requests without introducing large modifications in the current cloud schedulers. This scheduler enables the implementation of new cloud usage and payment models that allow more efficient usage of the resources and potential new revenue sources for commercial providers. We also study the correctness and the performace overhead of the proposed scheduler agains existing solutions

arXiv.org e-Print Archive

Digital.CSIC

Adaptive Big Data Pipeline

Author: Orozco-GómezSerrano Aldo
Publication venue: 'ITESO, A.C.'
Publication date: 01/09/2020
Field of study

Over the past three decades, data has exponentially evolved from being a simple software by-product to one of the most important companies’ assets used to understand their customers and foresee trends. Deep learning has demonstrated that big volumes of clean data generally provide more flexibility and accuracy when modeling a phenomenon. However, handling ever-increasing data volumes entail new challenges: the lack of expertise to select the appropriate big data tools for the processing pipelines, as well as the speed at which engineers can take such pipelines into production reliably, leveraging the cloud. We introduce a system called Adaptive Big Data Pipelines: a platform to automate data pipelines creation. It provides an interface to capture the data sources, transformations, destinations and execution schedule. The system builds up the cloud infrastructure, schedules and fine-tunes the transformations, and creates the data lineage graph. This system has been tested on data sets of 50 gigabytes, processing them in just a few minutes without user intervention.ITESO, A. C

Repositorio Institucional del ITESO

Resource Provisioning Exploiting Cost and Performance Diversity within IaaS Cloud Providers

Author: Leslie Luke Marius
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2013
Field of study

IaaS platforms such as Amazon EC2 allow clients access to massive computational power in the form of instances. Amazon hosts three different instance purchasing options, each with its own SLA covering pricing and availability. Amazon also offers access to a number of geographical regions, zones, and instance types to select from. In this thesis, the problem of utilizing Spot and On-Demand instances is analyzed and two approaches are presented in order to exploit the cost and performance diversity among different instance types and availability zones, and among the Spot markets they represent. We first develop RAMP, a framework designed to calculate the expected profit of using a specific Spot or On-Demand instance through an evaluation of instance reliability. RAMP is extended to develop RAMC-DC, a framework designed to allocate the most cost effective instance through strategies that facilitate interchangeability of instances among short jobs, reliability of instances among long jobs, and a comparison of the estimated costs of possible allocations. RAMC-DC achieves fault tolerance through comparisons of the price dynamics across instance types and availability zones, and through an examination of three basic checkpointing methods. Evaluations demonstrate that both frameworks take a large step toward low-volatility, high cost-efficiency resource provisioning. While achieving early-termination rates as low as 2.2%, RAMP can completely offset the total cost when charging the user just 17.5% of the On-Demand price. Moreover, the increases in profit resulting from relatively small additional charges to users are notably high, i.e., 100% profit compared to the resource provisioning cost with 35% of the equivalent On-Demand price. RAMC-DC can maintain deadline breaches below 1.8% of all jobs, achieve both early-termination and deadline breach rates as low as 0.5% of all jobs, and lowers total costs by between 80% and 87% compared to using only On-Demand instances

Sydney eScholarship