175,359 research outputs found

    Workload Prediction for Efficient Performance Isolation and System Reliability

    Get PDF
    In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers

    Influence of Resource Sharing on Performance

    Get PDF
    Resource sharing occurs when multiple active processes or software components compete for system resources, which influences the observed performance compared to an individual execution. Isolated benchmarking of durations of key operations for solving of performance prediction models may therefore yield imprecise results. Resource sharing also occurs between the measured code and the benchmark infrastructure for obtaining and storing samples, imposing an indirect overhead. This thesis quantifies the effects of sharing on performance for several resources that are often shared, namely the processor caches and the file systems. The highest possible performance impact of cache sharing is determined by synthetic benchmarks. Impact on practical code and its dependency on a number of factors such as cache trashing frequency and intensity are then determined by experiments with existing implementations of FFT and LZW algorithms and a video stream processing application. Effects of file system sharing are measured by experiments that read and write multiple files simultaneously. For both resources, situations with significant performance impact of sharing have been observed. Based on the results of the experiments, several suggestions for dealing with the overhead of performance monitoring infrastructure are...Sdílení prostředků nastává v případech, kdy několik současně aktivních procesů či softwarových komponent využívá stejné systémové prostředky, což ovlivňuje výkon v porovnání s individuálním během. Izolované měření dob trvání klíčových operací pro řešení modelů predikce výkonu tudíž může přinášet nepřesné výsledky. Sdílení prostředků také nastává mezi měřeným kódem a měřící infrastrukturou, která sbírá a ukládá výsledky, což nepřímo zvyšuje její režii. Tato práce kvantifikuje vlivy sdílení na výkon pro několik často sdílených prostředků, jmenovitě procesorových caches a souborových systémů. Horní odhad možného ovlivnění výkonu sdílením caches je stanoven pomocí syntetických testů. Účinky na praktický kód a jejich závislosti na různých faktorech, jako frekvence a intenzita trashování cache, jsou poté změřeny pomocí experimentů s existujícími implementacemi algoritmů FFT a LZW a aplikací pro zpracování videa. Efekty sdílení souborového systému na rychlost jsou změřeny pomocí experimentů provádějících hromadný zápis a čtení z několika souborů. Za určitých okolností lze pozorovat významné dopady sdílení u každého z uvažovaných prostředků. Na základě výsledků těchto měření je nadále navrženo několik rad pro řešení problému režie měřící infrastruktury. Také je zde diskutována použitelnost provedených experimentů a...Department of Software EngineeringKatedra softwarového inženýrstvíFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    DECENTRALIZED RESOURCE ORCHESTRATION FOR HETEROGENEOUS GRIDS

    Get PDF
    Modern desktop machines now use multi-core CPUs to enable improved performance. However, achieving high performance on multi-core machines without optimized software support is still difficult even in a single machine, because contention for shared resources can make it hard to exploit multiple computing resources efficiently. Moreover, more diverse and heterogeneous hardware platforms (e.g. general-purpose GPU and Cell processors) have emerged and begun to impact grid computing. Given that heterogeneity and diversity are now a major trend going forward, grid computing must support these environmental changes. In this dissertation, I design and evaluate a decentralized resource management scheme to exploit heterogeneous multiple computing resources effectively. I suggest resource management algorithms that can efficiently utilize a diverse computational environment, including multiple symmetric computing entities and heterogeneous multi-computing entities, and achieve good load-balancing and high total system throughput. Moreover, I propose expressive resource description techniques to accommodate more heterogeneous environments, allowing incoming jobs with complex requirements to be matched to available resources. First, I develop decentralized resource management frameworks and job scheduling schemes to exploit multi-core nodes in peer-to-peer grids. I present two new load-balancing schemes that explicitly account for resource sharing and contention across multiple cores within a single machine, and propose a simple performance prediction model that can represent a continuum of resource sharing among cores of a CPU. Second, I provide scalable resource discovery and load balancing techniques to accommodate nodes with many types of computing elements, such as multi-core CPUs and GPUs, in a peer-to-peer grid architecture. My scheme takes into account diverse aspects of heterogeneous nodes to maximize overall system throughput as well as minimize messaging costs without sacrificing the failure resilience provided by an underlying peer-to-peer overlay network. Finally, I propose an expressive resource discovery method to support multi-attribute, range-based job constraints. The common approach of using simple attribute indexes does not suffice, as range-based constraints may be satisfied by more than a single value. I design a compact ID-based representation for resource characteristics, and integrate this representation into the decentralized resource discovery framework. By extensive experimental results via simulation, I show that my schemes can match heterogeneous jobs to heterogeneous resources both effectively (good matches are found, load is balanced), and efficiently (the new functionality imposes little overhead)

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    Transparent resource sharing framework for internet services on handheld devices

    Get PDF
    Handheld devices have limited processing power and a short battery lifetime. As a result, computationally intensive applications cannot run appropriately or cause the device to run out of battery too early. Additionally, Internet-based service providers targeting these mobile devices lack information to estimate the remaining battery autonomy and have no view on the availability of idle resources in the neighborhood of the handheld device. These battery-related issues create an opportunity for Internet providers to broaden their role and start managing energy aspects of battery-driven mobile devices inside the home. In this paper, we propose an energy-aware resource-sharing framework that enables Internet access providers to delegate (a part of) a client application from a handheld device to idle resources in the LAN, in a transparent way for the end-user. The key component is the resource sharing service, hosted on the LAN gateway, which can be remotely queried and managed by the Internet access provider. The service includes a battery model to predict the remaining battery lifetime. We describe the concept of resource-sharing-as-a-service that allows users of handheld devices to subscribe to the resource sharing service. In a proof-of-concept, we evaluate the delay to offload a client application to an idle computer and study the impact on battery autonomy as a function of the CPU cycles that can be offloaded

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

    CloudScope: diagnosing and managing performance interference in multi-tenant clouds

    Get PDF
    © 2015 IEEE.Virtual machine consolidation is attractive in cloud computing platforms for several reasons including reduced infrastructure costs, lower energy consumption and ease of management. However, the interference between co-resident workloads caused by virtualization can violate the service level objectives (SLOs) that the cloud platform guarantees. Existing solutions to minimize interference between virtual machines (VMs) are mostly based on comprehensive micro-benchmarks or online training which makes them computationally intensive. In this paper, we present CloudScope, a system for diagnosing interference for multi-tenant cloud systems in a lightweight way. CloudScope employs a discrete-time Markov Chain model for the online prediction of performance interference of co-resident VMs. It uses the results to optimally (re)assign VMs to physical machines and to optimize the hypervisor configuration, e.g. the CPU share it can use, for different workloads. We have implemented CloudScope on top of the Xen hypervisor and conducted experiments using a set of CPU, disk, and network intensive workloads and a real system (MapReduce). Our results show that CloudScope interference prediction achieves an average error of 9%. The interference-aware scheduler improves VM performance by up to 10% compared to the default scheduler. In addition, the hypervisor reconfiguration can improve network throughput by up to 30%
    corecore