112,240 research outputs found

    Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters

    Get PDF
    Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm

    Autonomic system for optimal resource management in cloud environments

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Cloud computing is a large-scale distributed computing paradigm driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. Considering the lack of resources in cloud environments and fluctuating customer demands, cloud providers require to balance their resource load and utilization, and automatically allocate scarce resources to the services in an optimal way to deliver high performance physical and virtual resources and meet Service Level Agreement (SLA) criteria while minimizing their cost. This study proposes an Autonomic System for Optimal Resource Management (AS-ORM) that addresses three main topics of resource management in the cloud environment including: (1) resource estimation, (2) resource discovery and selection, and (3) resource allocation. A fuzzy Workload Prediction (WP) sub-system and a Multi-Objective Task Scheduling optimization (MOTS) sub-system are developed to cover the first two aforementioned topics. The WP sub-systems estimates Virtual Machines’ (VMs’) workload and resource utilization, and predicts Physical Machines’ (PMs) hotspots. The MOTS sub-system determines the optimal pattern to schedule tasks over VMs considering task transfer time, task execution cost/time, the length of the task queue of VMs and power consumption. To optimize the third topic in resource management, resource allocation, VM migration that is the current solution for optimizing physical resources allocation to VMs and load balancing among PMs, is investigated in this study. VM migration has been applied to system load balancing in cloud environments by memory transfer, suspend/resume migration, or live migration for the purpose of minimizing VM downtime and maximizing resource utilization. However, the migration process is both time- and cost-consuming as it requires large size files or memory pages to be transferred, and consumes a huge amount of power and memory for the origin and destination PMs especially for storage VM migration. This process also leads to VM downtime or slowdown. To deal with these shortcomings, a Fuzzy Predictable Task-based System Load Balancing (FP-TBSLB) sub-system is developed that avoids VM migration and achieves system load balancing by transferring extra workload from a poorly performing VM to other compatible VMs with more capacity. To reduce the time factor even more and optimize load balancing over a cloud cluster, FP-TBSLB sub-system applies WP sub-system to not only predict the performance of VMs, but also determine a set of appropriate VMs that have the potential to execute the extra workload imposed on the poorly performing VMs. In addition, FP-TBSLB sub-system employs the MOTS sub-system to migrate the extra workload of poorly performing VMs to the compatible VMs. The AS-ORM system is evaluated using a VMware-vSphere based private cloud environment with VMware ESXi hypervisor. The evaluation results show the benefit of the AS-ORM in reducing the time taken for the load balancing process compared to traditional approaches. The application of this system has the added advantage that the VMs will not be slowed down during the migration process. The system also achieves significant reduction in memory usage, execution time, job makespan and power consumption. Therefore, the AS-ORM dramatically increases VM performance and reduces service response time. The AS-ORM can be applied in the hypervisor layer to optimize resource management and load balancing which boosts the Quality of Service (QoS) expected by cloud customers

    The management of academic workloads: summary report

    Get PDF

    A Big Data Analyzer for Large Trace Logs

    Full text link
    Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents BiDAl, a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center.Comment: 26 pages, 10 figure

    The management of academic workloads: full report on findings

    Get PDF
    The pressures on UK higher education (from explicit competition and growth in student numbers, to severe regulatory demands) are greater than ever, and have resulted in a steady increase in measures taken by universities to actively manage their finances and overall quality. These pressures are also likely to have impacted on staff and, indeed, recent large surveys in the sector have indicated that almost half of respondents find their workloads unmanageable. Against this background it would seem logical that the emphasis on institutional interventions to improve finance and quality, should be matched by similar attention given to the allocation of workloads to staff, and a focus on how best to utilise people’s time - the single biggest resource available within universities. Thus the aim of this piece of research was to focus on the processes and practices surrounding the allocation of staff workloads within higher education. Ten diverse organisations were selected for study: six universities in the UK, two overseas universities and two non higher education (but knowledge-intensive) organisations. In each, a crosssection of staff was selected, and in-depth interviews carried out. A total of 59 such interviews were carried out across the ten organisations. By identifying typical practices, as well as interesting alternatives, views on the various strengths and weaknesses of each of their workload allocation approaches was collated; and associated factors requiring attention identified. Through an extensive process of analysis, approaches which promoted more equitable loads for individuals, and which might provide synergies for institutions were also investigated

    Dynamic Resource Management in Clouds: A Probabilistic Approach

    Full text link
    Dynamic resource management has become an active area of research in the Cloud Computing paradigm. Cost of resources varies significantly depending on configuration for using them. Hence efficient management of resources is of prime interest to both Cloud Providers and Cloud Users. In this work we suggest a probabilistic resource provisioning approach that can be exploited as the input of a dynamic resource management scheme. Using a Video on Demand use case to justify our claims, we propose an analytical model inspired from standard models developed for epidemiology spreading, to represent sudden and intense workload variations. We show that the resulting model verifies a Large Deviation Principle that statistically characterizes extreme rare events, such as the ones produced by "buzz/flash crowd effects" that may cause workload overflow in the VoD context. This analysis provides valuable insight on expectable abnormal behaviors of systems. We exploit the information obtained using the Large Deviation Principle for the proposed Video on Demand use-case for defining policies (Service Level Agreements). We believe these policies for elastic resource provisioning and usage may be of some interest to all stakeholders in the emerging context of cloud networkingComment: IEICE Transactions on Communications (2012). arXiv admin note: substantial text overlap with arXiv:1209.515

    Managing Dynamic Enterprise and Urgent Workloads on Clouds Using Layered Queuing and Historical Performance Models

    No full text
    The automatic allocation of enterprise workload to resources can be enhanced by being able to make what-if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: i.) comparatively evaluate the layered queuing and historical techniques; ii.) evaluate the effectiveness of the management algorithm in different operating scenarios; and iii.) provide guidance on using prediction-based workload and resource management
    • …
    corecore