Search CORE

982 research outputs found

An Empirical Analysis of Scheduling Techniques for Real-Time Cloud-Based Data Processing

Author: Lee Insup
Loo Boon Thau
Phan Linh T.X.
Zhang Zhuoyao
Zheng Qi
Publication venue: ScholarlyCommons
Publication date: 01/12/2011
Field of study

In this paper, we explore the challenges and needs of current cloud infrastructures, to better support cloud-based data-intensive applications that are not only latency-sensitive but also require strong timing guarantees. These applications have strict deadlines (e.g., to perform time-dependent mission critical tasks or to complete real-time control decisions using a human-in-the-loop), and deadline misses are undesirable. To highlight the challenges in this space, we provide a case study of the online scheduling of MapReduce jobs executed by Hadoop. Our evaluations on Amazon EC2 show that the existing Hadoop scheduler is ill-equipped to handle jobs with deadlines. However, by adapting existing multiprocessor scheduling techniques for the cloud environment, we observe significant performance improvements in minimizing missed deadlines and tardiness. Based on our case study, we discuss a range of challenges in this domain posed by virtualization and scale, and propose our research agenda centered around the application of advanced real-time scheduling techniques in the cloud environment

ScholarlyCommons@Penn

Cost-Effective Resource Provisioning for MapReduce in a Cloud

Author: Liu L
Palanisamy B
Singh A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper presents a new MapReduce cloud service model, Cura, for provisioning cost-effective MapReduce services in a cloud. In contrast to existing MapReduce cloud services such as a generic compute cloud or a dedicated MapReduce cloud, Cura has a number of unique benefits. First, Cura is designed to provide a cost-effective solution to efficiently handle MapReduce production workloads that have a significant amount of interactive jobs. Second, unlike existing services that require customers to decide the resources to be used for the jobs, Cura leverages MapReduce profiling to automatically create the best cluster configuration for the jobs. While the existing models allow only a per-job resource optimization for the jobs, Cura implements a globally efficient resource allocation scheme that significantly reduces the resource usage cost in the cloud. Third, Cura leverages unique optimization opportunities when dealing with workloads that can withstand some slack. By effectively multiplexing the available cloud resources among the jobs based on the job requirements, Cura achieves significantly lower resource usage costs for the jobs. Cura's core resource management schemes include cost-aware resource provisioning, VM-aware scheduling and online virtual machine reconfiguration. Our experimental results using Facebook-like workload traces show that our techniques lead to more than 80 percent reduction in the cloud compute infrastructure cost with upto 65 percent reduction in job response times

Crossref

D-Scholarship@Pitt

Multi-Objective Scientific-Workflow Scheduling With Data Movement Awareness in Cloud.

Author: Bouvry Pascal
Lavagnananda Kittichai
Wangsom Peerasak
Publication venue
Publication date: 01/01/2019
Field of study

Due to serving several purposes simultaneously, running scientific workflows on dynamic environments such as cloud computing, has become multi-objective scheduling. Among these purposes, Cost and Makespan are probably the most two primitive objectives. Another critical factor in a large-scale scientific workflow is tremendous amount of data during execution. Therefore, this work also includes Data Movement as an additional objective as it has a major impact on network utilization and energy consumption in network equipment in cloud data center. In considering these three objectives, this work proposes a framework for scheduling solutions which combines a new nodes clustering technique in Directed Acyclic Graph (DAG) model known as Multilevel Dependent Node Clustering (MDNC) and the multiobjective optimization, Extreme Nondominated Sorting Genetic Algorithm-III (E-NSGA-III). E-NSGAIII is the recent extension of Nondominated Sorting Genetic Algorithm (NSGA-III). Five well-known scientific workflows, CyberShake, Epigenomics, LIGO, Montage, and SIPHT are selected as testbeds, while the commonly known Hypervolume is chosen as the performance metric. In this work, MDNC is also experimented with both NSGA-III. Comparison among three approaches, E-NAGA-III alone, E-NAGA-III with Peer-to-Peer clustering and E-NAGA-III with MDNC are carried out. The superiority of the proposed framework among them and its limitation are discussed

Open Repository and Bibliography - Luxembourg

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

Author: Gupta Indranil
Mai Luo
Potharaju Rahul
Venkataraman Shivaram
Xu Le
Publication venue
Publication date: 06/10/2020
Field of study

Resource provisioning in multi-tenant stream processing systems faces the dual challenges of keeping resource utilization high (without over-provisioning), and ensuring performance isolation. In our common production use cases, where streaming workloads have to meet latency targets and avoid breaching service-level agreements, existing solutions are incapable of handling the wide variability of user needs. Our framework called Cameo uses fine-grained stream processing (inspired by actor computation models), and is able to provide high resource utilization while meeting latency targets. Cameo dynamically calculates and propagates priorities of events based on user latency targets and query semantics. Experiments on Microsoft Azure show that compared to state-of-the-art, the Cameo framework: i) reduces query latency by 2.7X in single tenant settings, ii) reduces query latency by 4.6X in multi-tenant scenarios, and iii) weathers transient spikes of workload

arXiv.org e-Print Archive

Edinburgh Research Explorer