2,060 research outputs found
A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce
Nowadays many companies have available large amounts of raw, unstructured
data. Among Big Data enabling technologies, a central place is held by the
MapReduce framework and, in particular, by its open source implementation,
Apache Hadoop. For cost effectiveness considerations, a common approach entails
sharing server clusters among multiple users. The underlying infrastructure
should provide every user with a fair share of computational resources,
ensuring that Service Level Agreements (SLAs) are met and avoiding wastes. In
this paper we consider two mathematical programming problems that model the
optimal allocation of computational resources in a Hadoop 2.x cluster with the
aim to develop new capacity allocation techniques that guarantee better
performance in shared data centers. Our goal is to get a substantial reduction
of power consumption while respecting the deadlines stated in the SLAs and
avoiding penalties associated with job rejections. The core of this approach is
a distributed algorithm for runtime capacity allocation, based on Game Theory
models and techniques, that mimics the MapReduce dynamics by means of
interacting players, namely the central Resource Manager and Class Managers
Parallel detrended fluctuation analysis for fast event detection on massive PMU data
("(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahl's Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment
Power Management in Heterogeneous MapReduce Cluster
The growing expenses of power in data centers as compared to the operation costs has been a concern for the past several decades. It has been predicted that without an intervention, the energy cost will soon outgrow the infrastructure and operation cost. Therefore, it is of great importance to make data center clusters more energy efficient which is critical for avoiding system overheating and failures. In addition, energy inefficiency causes not only the loss of capital but also environmental pollution. Various Power Management(PM) strategies have been developed over the years to make system more energy efficient and to counteract the sharply rising cost of electricity. However, it is still a challenge to make the system both power efficient and computation efficient due to many underlying system constraints.
In this thesis, we investigate the Power Management technique in heterogeneous MapReduce clusters while also maintaining the required system QoS (Quality of Service). For a cluster that supports MapReduce jobs, it is necessary to develop a PM technique that also considers the data availability. We develop our PM strategy by exploiting the fact that the servers in the system are underutilized most of the time. Hence, we first develop a model of our testbed and study how the server utilization levels affect the power consumption and the system throughput. With the established models, we form and solve the power optimization problem for heterogeneous MadReduce clusters where we control the server utilization levels intelligently to minimize the total power consumption.
We have conducted simulations and shown the power savings achieved using our PM technique. Then we validate some of our simulation results by running experiments in a real testbed. Our simulation and experimental data have shown that our PM strategy works well for heterogeneous MapReduce clusters which consists of different power efficient and inefficient servers.
Adviser: Ying L
Multi-Objective Big Data Optimization with jMetal and Spark
Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization
problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Building Wavelet Histograms on Large Data in MapReduce
MapReduce is becoming the de facto framework for storing and processing
massive data, due to its excellent scalability, reliability, and elasticity. In
many MapReduce applications, obtaining a compact accurate summary of data is
essential. Among various data summarization tools, histograms have proven to be
particularly important and useful for summarizing data, and the wavelet
histogram is one of the most widely used histograms. In this paper, we
investigate the problem of building wavelet histograms efficiently on large
datasets in MapReduce. We measure the efficiency of the algorithms by both
end-to-end running time and communication cost. We demonstrate straightforward
adaptations of existing exact and approximate methods for building wavelet
histograms to MapReduce clusters are highly inefficient. To that end, we design
new algorithms for computing exact and approximate wavelet histograms and
discuss their implementation in MapReduce. We illustrate our techniques in
Hadoop, and compare to baseline solutions with extensive experiments performed
in a heterogeneous Hadoop cluster of 16 nodes, using large real and synthetic
datasets, up to hundreds of gigabytes. The results suggest significant (often
orders of magnitude) performance improvement achieved by our new algorithms.Comment: VLDB201
Evolutionary Neural Network Based Energy Consumption Forecast for Cloud Computing
The success of Hadoop, an open-source
framework for massively parallel and distributed computing, is
expected to drive energy consumption of cloud data centers to
new highs as service providers continue to add new
infrastructure, services and capabilities to meet the market
demands. While current research on data center airflow
management, HVAC (Heating, Ventilation and Air
Conditioning) system design, workload distribution and
optimization, and energy efficient computing hardware and
software are all contributing to improved energy efficiency,
energy forecast in cloud computing remains a challenge. This
paper reports an evolutionary computation based modeling
and forecasting approach to this problem. In particular, an
evolutionary neural network is developed and structurally
optimized to forecast the energy load of a cloud data center.
The results, both in terms of forecasting speed and accuracy,
suggest that the evolutionary neural network approach to
energy consumption forecasting for cloud computing is highly
promising
- …