112,240 research outputs found
Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters
Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm
Autonomic system for optimal resource management in cloud environments
University of Technology Sydney. Faculty of Engineering and Information Technology.Cloud computing is a large-scale distributed computing paradigm driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. Considering the lack of resources in cloud environments and fluctuating customer demands, cloud providers require to balance their resource load and utilization, and automatically allocate scarce resources to the services in an optimal way to deliver high performance physical and virtual resources and meet Service Level Agreement (SLA) criteria while minimizing their cost.
This study proposes an Autonomic System for Optimal Resource Management (AS-ORM) that addresses three main topics of resource management in the cloud environment including: (1) resource estimation, (2) resource discovery and selection, and (3) resource allocation. A fuzzy Workload Prediction (WP) sub-system and a Multi-Objective Task Scheduling optimization (MOTS) sub-system are developed to cover the first two aforementioned topics. The WP sub-systems estimates Virtual Machines’ (VMs’) workload and resource utilization, and predicts Physical Machines’ (PMs) hotspots. The MOTS sub-system determines the optimal pattern to schedule tasks over VMs considering task transfer time, task execution cost/time, the length of the task queue of VMs and power consumption.
To optimize the third topic in resource management, resource allocation, VM migration that is the current solution for optimizing physical resources allocation to VMs and load balancing among PMs, is investigated in this study. VM migration has been applied to system load balancing in cloud environments by memory transfer, suspend/resume migration, or live migration for the purpose of minimizing VM downtime and maximizing resource utilization. However, the migration process is both time- and cost-consuming as it requires large size files or memory pages to be transferred, and consumes a huge amount of power and memory for the origin and destination PMs especially for storage VM migration. This process also leads to VM downtime or slowdown. To deal with these shortcomings, a Fuzzy Predictable Task-based System Load Balancing (FP-TBSLB) sub-system is developed that avoids VM migration and achieves system load balancing by transferring extra workload from a poorly performing VM to other compatible VMs with more capacity. To reduce the time factor even more and optimize load balancing over a cloud cluster, FP-TBSLB sub-system applies WP sub-system to not only predict the performance of VMs, but also determine a set of appropriate VMs that have the potential to execute the extra workload imposed on the poorly performing VMs. In addition, FP-TBSLB sub-system employs the MOTS sub-system to migrate the extra workload of poorly performing VMs to the compatible VMs.
The AS-ORM system is evaluated using a VMware-vSphere based private cloud environment with VMware ESXi hypervisor. The evaluation results show the benefit of the AS-ORM in reducing the time taken for the load balancing process compared to traditional approaches. The application of this system has the added advantage that the VMs will not be slowed down during the migration process. The system also achieves significant reduction in memory usage, execution time, job makespan and power consumption. Therefore, the AS-ORM dramatically increases VM performance and reduces service response time. The AS-ORM can be applied in the hypervisor layer to optimize resource management and load balancing which boosts the Quality of Service (QoS) expected by cloud customers
A Big Data Analyzer for Large Trace Logs
Current generation of Internet-based services are typically hosted on large
data centers that take the form of warehouse-size structures housing tens of
thousands of servers. Continued availability of a modern data center is the
result of a complex orchestration among many internal and external actors
including computing hardware, multiple layers of intricate software, networking
and storage devices, electrical power and cooling plants. During the course of
their operation, many of these components produce large amounts of data in the
form of event and error logs that are essential not only for identifying and
resolving problems but also for improving data center efficiency and
management. Most of these activities would benefit significantly from data
analytics techniques to exploit hidden statistical patterns and correlations
that may be present in the data. The sheer volume of data to be analyzed makes
uncovering these correlations and patterns a challenging task. This paper
presents BiDAl, a prototype Java tool for log-data analysis that incorporates
several Big Data technologies in order to simplify the task of extracting
information from data traces produced by large clusters and server farms. BiDAl
provides the user with several analysis languages (SQL, R and Hadoop MapReduce)
and storage backends (HDFS and SQLite) that can be freely mixed and matched so
that a custom tool for a specific task can be easily constructed. BiDAl has a
modular architecture so that it can be extended with other backends and
analysis languages in the future. In this paper we present the design of BiDAl
and describe our experience using it to analyze publicly-available traces from
Google data clusters, with the goal of building a realistic model of a complex
data center.Comment: 26 pages, 10 figure
The management of academic workloads: full report on findings
The pressures on UK higher education (from explicit
competition and growth in student numbers, to severe
regulatory demands) are greater than ever, and have
resulted in a steady increase in measures taken by
universities to actively manage their finances and overall
quality. These pressures are also likely to have impacted on staff and, indeed, recent large surveys in the sector have indicated that almost half of respondents find their
workloads unmanageable. Against this background it would
seem logical that the emphasis on institutional interventions to improve finance and quality, should be matched by similar attention given to the allocation of workloads to staff, and a focus on how best to utilise people’s time - the single biggest resource available within universities.
Thus the aim of this piece of research was to focus on the
processes and practices surrounding the allocation of staff
workloads within higher education. Ten diverse organisations were selected for study: six universities in the UK, two overseas universities and two non higher education (but knowledge-intensive) organisations. In each, a crosssection of staff was selected, and in-depth interviews carried out. A total of 59 such interviews were carried out across the ten organisations. By identifying typical practices, as well as interesting alternatives, views on the various strengths and weaknesses of each of their workload allocation approaches was collated; and associated factors requiring attention identified. Through an extensive process of analysis, approaches which promoted more equitable loads for individuals, and which might provide synergies for institutions were also investigated
Dynamic Resource Management in Clouds: A Probabilistic Approach
Dynamic resource management has become an active area of research in the
Cloud Computing paradigm. Cost of resources varies significantly depending on
configuration for using them. Hence efficient management of resources is of
prime interest to both Cloud Providers and Cloud Users. In this work we suggest
a probabilistic resource provisioning approach that can be exploited as the
input of a dynamic resource management scheme. Using a Video on Demand use case
to justify our claims, we propose an analytical model inspired from standard
models developed for epidemiology spreading, to represent sudden and intense
workload variations. We show that the resulting model verifies a Large
Deviation Principle that statistically characterizes extreme rare events, such
as the ones produced by "buzz/flash crowd effects" that may cause workload
overflow in the VoD context. This analysis provides valuable insight on
expectable abnormal behaviors of systems. We exploit the information obtained
using the Large Deviation Principle for the proposed Video on Demand use-case
for defining policies (Service Level Agreements). We believe these policies for
elastic resource provisioning and usage may be of some interest to all
stakeholders in the emerging context of cloud networkingComment: IEICE Transactions on Communications (2012). arXiv admin note:
substantial text overlap with arXiv:1209.515
Managing Dynamic Enterprise and Urgent Workloads on Clouds Using Layered Queuing and Historical Performance Models
The automatic allocation of enterprise workload to resources can be enhanced by being able to make what-if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: i.) comparatively evaluate the layered queuing and historical techniques; ii.) evaluate the effectiveness of the management algorithm in different operating scenarios; and iii.) provide guidance on using prediction-based workload and resource management
- …