618 research outputs found

    Autonomic Resource Management for a Cluster that Executes Batch Jobs

    Get PDF
    Resource management of large scale clusters is traditionally done manually. Servers are usually over-provisioned to meet the peak demand of workload. It is widely known that manual provisioning is error-prone and inefficient. These problems can be addressed by the use of autonomic clusters that manage their own resources. In those clusters, server nodes are dynamically allocated based on the system performance goals. In this thesis, we develop heuristic algorithms for the dynamic provisioning of a cluster that executes batch jobs with a shared completion deadline. External factors that may affect the decision to use servers during a certain time period are modeled as a time-varying cost function. The provisioning goal is ensure that all jobs are completed on time while minimizing the total cost of server usage. Five resource provisioning heuristic algorithms which adapt to changing workload are presented. The merit of these heuristics is evaluated by simulation. In our simulation, the job arrival rate is time-dependent which captures the typical job profile of a batch environment. Our results show that heuristics that take into consideration the cost function perform better than the others

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

    DEPAS: A Decentralized Probabilistic Algorithm for Auto-Scaling

    Full text link
    The dynamic provisioning of virtualized resources offered by cloud computing infrastructures allows applications deployed in a cloud environment to automatically increase and decrease the amount of used resources. This capability is called auto-scaling and its main purpose is to automatically adjust the scale of the system that is running the application to satisfy the varying workload with minimum resource utilization. The need for auto-scaling is particularly important during workload peaks, in which applications may need to scale up to extremely large-scale systems. Both the research community and the main cloud providers have already developed auto-scaling solutions. However, most research solutions are centralized and not suitable for managing large-scale systems, moreover cloud providers' solutions are bound to the limitations of a specific provider in terms of resource prices, availability, reliability, and connectivity. In this paper we propose DEPAS, a decentralized probabilistic auto-scaling algorithm integrated into a P2P architecture that is cloud provider independent, thus allowing the auto-scaling of services over multiple cloud infrastructures at the same time. Our simulations, which are based on real service traces, show that our approach is capable of: (i) keeping the overall utilization of all the instantiated cloud resources in a target range, (ii) maintaining service response times close to the ones obtained using optimal centralized auto-scaling approaches.Comment: Submitted to Springer Computin

    Autonomic Management of Large Clusters and Their Integration into the Grid

    Get PDF
    We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a steering cycle and appropriate actions are taken to resolve any problems. All presented components have been implemented in the course of the EU project DataGrid: The Lemon monitoring components, the FT fault-tolerance mechanism, the quattor system for software installation and configuration, the RMS job and resource management system, and the Gridification scheme that integrates clusters into the Grid

    H2O: An Autonomic, Resource-Aware Distributed Database System

    Get PDF
    This paper presents the design of an autonomic, resource-aware distributed database which enables data to be backed up and shared without complex manual administration. The database, H2O, is designed to make use of unused resources on workstation machines. Creating and maintaining highly-available, replicated database systems can be difficult for untrained users, and costly for IT departments. H2O reduces the need for manual administration by autonomically replicating data and load-balancing across machines in an enterprise. Provisioning hardware to run a database system can be unnecessarily costly as most organizations already possess large quantities of idle resources in workstation machines. H2O is designed to utilize this unused capacity by using resource availability information to place data and plan queries over workstation machines that are already being used for other tasks. This paper discusses the requirements for such a system and presents the design and implementation of H2O.Comment: Presented at SICSA PhD Conference 2010 (http://www.sicsaconf.org/

    Cluster-Wide Context Switch of Virtualized Jobs

    Get PDF
    International audienceClusters are mostly used through Resources Management Systems (RMS) with a static allocation of resources for a bounded amount of time. Those approaches are known to be insufficient for an efficient use of clusters. To provide a finer RMS, job preemption, migration and dynamic allocation of resources are required. However due to the complexity of developing and using such mechanisms, advanced scheduling strategies have rarely been deployed. This trend is currently evolving thanks to the use of migration and preemption capabilities of Virtual Machines (VMs). However, although the manipulation of jobs composed of VM enables to change the state of the jobs according to the scheduling objective, changing the state and the location of numerous VMs at each decision is tedious and degrades the overall performance. In addition to the scheduling policy implementation, developers have to focus on the feasibility of the actions while executing them in the most efficient way. In this paper, we argue such an operation is independent from the policy itself and can be addressed through a generic mechanism, the cluster-wide context switch. Thanks to it, developers can implement sophisticated algorithms to schedule jobs without handling the issues related to their manipulations. They only focus on the implementation of their algorithm to select the jobs to run while the cluster-wide context switch system performs the necessary actions to switch from the current to the new situation. As a proof of concept, we evaluate the interest of the cluster-wide context switch through a sample scheduler that executes jobs as early as possible, even partially, regarding to their current resources requirements and their priority

    Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

    Get PDF
    This chapter presents software architectures of the big data processing platforms. It will provide an in-depth knowledge on resource management techniques involved while deploying big data processing systems on cloud environment. It starts from the very basics and gradually introduce the core components of resource management which we have divided in multiple layers. It covers the state-of-art practices and researches done in SLA-based resource management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure

    Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

    Get PDF
    corecore