481 research outputs found
A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce
Nowadays many companies have available large amounts of raw, unstructured
data. Among Big Data enabling technologies, a central place is held by the
MapReduce framework and, in particular, by its open source implementation,
Apache Hadoop. For cost effectiveness considerations, a common approach entails
sharing server clusters among multiple users. The underlying infrastructure
should provide every user with a fair share of computational resources,
ensuring that Service Level Agreements (SLAs) are met and avoiding wastes. In
this paper we consider two mathematical programming problems that model the
optimal allocation of computational resources in a Hadoop 2.x cluster with the
aim to develop new capacity allocation techniques that guarantee better
performance in shared data centers. Our goal is to get a substantial reduction
of power consumption while respecting the deadlines stated in the SLAs and
avoiding penalties associated with job rejections. The core of this approach is
a distributed algorithm for runtime capacity allocation, based on Game Theory
models and techniques, that mimics the MapReduce dynamics by means of
interacting players, namely the central Resource Manager and Class Managers
D-SPACE4Cloud: A Design Tool for Big Data Applications
The last years have seen a steep rise in data generation worldwide, with the
development and widespread adoption of several software projects targeting the
Big Data paradigm. Many companies currently engage in Big Data analytics as
part of their core business activities, nonetheless there are no tools and
techniques to support the design of the underlying hardware configuration
backing such systems. In particular, the focus in this report is set on Cloud
deployed clusters, which represent a cost-effective alternative to on premises
installations. We propose a novel tool implementing a battery of optimization
and prediction techniques integrated so as to efficiently assess several
alternative resource configurations, in order to determine the minimum cost
cluster deployment satisfying QoS constraints. Further, the experimental
campaign conducted on real systems shows the validity and relevance of the
proposed method
OS-Assisted Task Preemption for Hadoop
This work introduces a new task preemption primitive for Hadoop, that allows
tasks to be suspended and resumed exploiting existing memory management
mechanisms readily available in modern operating systems. Our technique fills
the gap that exists between the two extremes cases of killing tasks (which
waste work) or waiting for their completion (which introduces latency):
experimental results indicate superior performance and very small overheads
when compared to existing alternatives
Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
MapReduce has become a popular programming model for running data intensive
applications on the cloud. Completion time goals or deadlines of MapReduce jobs
set by users are becoming crucial in existing cloud-based data processing
environments like Hadoop. There is a conflict between the scheduling MR jobs to
meet deadlines and "data locality" (assigning tasks to nodes that contain their
input data). To meet the deadline a task may be scheduled on a node without
local input data for that task causing expensive data transfer from a remote
node. In this paper, a novel scheduler is proposed to address the above problem
which is primarily based on the dynamic resource reconfiguration approach. It
has two components: 1) Resource Predictor: which dynamically determines the
required number of Map/Reduce slots for every job to meet completion time
guarantee; 2) Resource Reconfigurator: that adjusts the CPU resources while not
violating completion time goals of the users by dynamically increasing or
decreasing individual VMs to maximize data locality and also to maximize the
use of resources within the system among the active jobs. The proposed
scheduler has been evaluated against Fair Scheduler on virtual cluster built on
a physical cluster of 20 machines. The results demonstrate a gain of about 12%
increase in throughput of Job
- …