11,743 research outputs found

    Colocation-aware Resource Management for Distributed Parallel Applications in Consolidated Clusters

    Get PDF
    Department of Computer Science and EngineeringConsolidated clusters, which run various distributed parallel applications such as big data frameworks, machine learning applications, and scienti???c applications to solve complex problems in wide range of fields, are already used commonly. Resource providers allow various applications with different characteristics to execute together to efficiently utilize their resources. There are some important issues about scheduling applications to resources. When applications share the same resources, interference between them affects their performance. The performance of applications can be improved or degraded depending on which resources are used to execute them based on various characteristics of applications and resources. Characteristics and resource requirements of applications can constrain their placement, and these constraints can be extended to constraints between applications. These issues should be considered to manage resource e???ciently and improve the performance of applications. In this thesis, we study how to manage resources e???ciently while scheduling distributed parallel applications in consolidated clusters. First, we present a holistic VM placement technique for distributed parallel applications in heterogeneous virtual cluster, aiming to maximize the e???ciency of the cluster and consequently reduce cost for service providers and users. We analyze the e???ect of heterogeneity of resource, di???erent VM con???gurations, and interference between VMs on the performance of distributed parallel applications and propose a placement technique that uses a machine learning algorithm to estimate the runtime of a distributed parallel application. Second, we present a two-level scheduling algorithms, which distribute applications to platforms then map tasks to each node. we analyze the platform and co-runner a???nities of looselycoupled applications and use them for scheduling decision. Third, we study constraint-aware VM placement in heterogeneous clusters. We present a modelofVMplacementconstraintsandconstraint-awareVMplacementalgorithms. Weanalyze the e???ect of VM placement constraint, and evaluate the performance of algorithms over various settings with simulation and experiments in a small cluster. Finally, we propose interference-awareresource management system for CNN models in GPU cluster. We analyze the e???ect of interference between CNN models. We then propose techniques to mitigate slowdown from interference for target model, and to predict performance of CNN models when they are co-located. We propose heuristic algorithm to schedule CNN models, and evaluate the techniques and algorithm from experiments in GPU cluster.clos

    Adaptive runtime techniques for power and resource management on multi-core systems

    Full text link
    Energy-related costs are among the major contributors to the total cost of ownership of data centers and high-performance computing (HPC) clusters. As a result, future data centers must be energy-efficient to meet the continuously increasing computational demand. Constraining the power consumption of the servers is a widely used approach for managing energy costs and complying with power delivery limitations. In tandem, virtualization has become a common practice, as virtualization reduces hardware and power requirements by enabling consolidation of multiple applications on to a smaller set of physical resources. However, administration and management of data center resources have become more complex due to the growing number of virtualized servers installed in data centers. Therefore, designing autonomous and adaptive energy efficiency approaches is crucial to achieve sustainable and cost-efficient operation in data centers. Many modern data centers running enterprise workloads successfully implement energy efficiency approaches today. However, the nature of multi-threaded applications, which are becoming more common in all computing domains, brings additional design and management challenges. Tackling these challenges requires a deeper understanding of the interactions between the applications and the underlying hardware nodes. Although cluster-level management techniques bring significant benefits, node-level techniques provide more visibility into application characteristics, which can then be used to further improve the overall energy efficiency of the data centers. This thesis proposes adaptive runtime power and resource management techniques on multi-core systems. It demonstrates that taking the multi-threaded workload characteristics into account during management significantly improves the energy efficiency of the server nodes, which are the basic building blocks of data centers. The key distinguishing features of this work are as follows: We implement the proposed runtime techniques on state-of-the-art commodity multi-core servers and show that their energy efficiency can be significantly improved by (1) taking multi-threaded application specific characteristics into account while making resource allocation decisions, (2) accurately tracking dynamically changing power constraints by using low-overhead application-aware runtime techniques, and (3) coordinating dynamic adaptive decisions at various layers of the computing stack, specifically at system and application levels. Our results show that efficient resource distribution under power constraints yields energy savings of up to 24% compared to existing approaches, along with the ability to meet power constraints 98% of the time for a diverse set of multi-threaded applications

    CloudScope: diagnosing and managing performance interference in multi-tenant clouds

    Get PDF
    © 2015 IEEE.Virtual machine consolidation is attractive in cloud computing platforms for several reasons including reduced infrastructure costs, lower energy consumption and ease of management. However, the interference between co-resident workloads caused by virtualization can violate the service level objectives (SLOs) that the cloud platform guarantees. Existing solutions to minimize interference between virtual machines (VMs) are mostly based on comprehensive micro-benchmarks or online training which makes them computationally intensive. In this paper, we present CloudScope, a system for diagnosing interference for multi-tenant cloud systems in a lightweight way. CloudScope employs a discrete-time Markov Chain model for the online prediction of performance interference of co-resident VMs. It uses the results to optimally (re)assign VMs to physical machines and to optimize the hypervisor configuration, e.g. the CPU share it can use, for different workloads. We have implemented CloudScope on top of the Xen hypervisor and conducted experiments using a set of CPU, disk, and network intensive workloads and a real system (MapReduce). Our results show that CloudScope interference prediction achieves an average error of 9%. The interference-aware scheduler improves VM performance by up to 10% compared to the default scheduler. In addition, the hypervisor reconfiguration can improve network throughput by up to 30%

    Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

    Get PDF
    This chapter presents software architectures of the big data processing platforms. It will provide an in-depth knowledge on resource management techniques involved while deploying big data processing systems on cloud environment. It starts from the very basics and gradually introduce the core components of resource management which we have divided in multiple layers. It covers the state-of-art practices and researches done in SLA-based resource management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure
    corecore