104 research outputs found

    COST-EFFICIENT RESOURCE PROVISIONING FOR CLOUD-ENABLED SCHEDULERS

    Get PDF
    Since the last decade, public cloud platforms are rapidly becoming de-facto computing platform for our society. To support the wide range of users and their diverse applications, public cloud platforms started to offer the same VMs under many purchasing options that differ across their cost, performance, availability, and time commitments. Popular purchasing options include on-demand, reserved, and transient VM types. Reserved VMs require long time commitments, whereas users can acquire and release the on-demand (and transient) VMs at any time. While transient VMs cost significantly less than on-demand VMs, platforms may revoke them at any time. In general, the stronger the commitment, i.e., longer and less flexible, the lower the price. However, longer and less flexible time commitments can increase cloud costs for users if future workloads cannot utilize the VMs they committed to buying. Interestingly, this wide range of purchasing options provide opportunities for cost savings. However, large cloud customers often find it challenging to choose the right mix of purchasing options to minimize their long-term costs while retaining the ability to adjust their capacity up and down in response to workload variations. Thus, optimizing the cloud costs requires users to select a mix of VM purchasing options based on their short- and long-term expectation of workload utilization. Notably, hybrid clouds combine multiple VM purchasing options or private clusters with public cloud VMs to optimize the cloud costs based on their workload expectations. In this thesis, we address the challenge of choosing a mix of different VM purchasing options in the context of large cloud customers and thereby optimizing their cloud costs. To this end, we make the following contributions: (i) design and implement a container orchestration platform (using Kubernetes) to optimize the cost of executing mixed interactive and batch workloads on cloud platforms using on-demand and transient VMs, (ii) develop simple analytical models for different straggler mitigation techniques to better understand the cost of synchronization in distributed machine learning workloads and compare their cost and performance on on-demand and transient VMs, (iii) design multiple policies to optimize long-term cloud costs by selecting a mix of VM purchasing options based on short- and long-term expectations of workload utilization (with no job waiting), (iv) introduce the concept of waiting policy for cloud-enabled schedulers, and show that provisioning long-term resources (e.g., reserved VMs) to optimize the cloud costs is dependent on it, and (v) design and implement speculative execution and ML-based waiting time predictions (for waiting policies) to show that optimizing job waiting in the cloud is possible without accurate job runtime predictions

    Transiency-driven Resource Management for Cloud Computing Platforms

    Get PDF
    Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator. Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%. This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements. Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server\u27s resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation

    Carbon Containers: A System-level Facility for Managing Application-level Carbon Emissions

    Full text link
    To reduce their environmental impact, cloud datacenters' are increasingly focused on optimizing applications' carbon-efficiency, or work done per mass of carbon emitted. To facilitate such optimizations, we present Carbon Containers, a simple system-level facility, which extends prior work on power containers, that automatically regulates applications' carbon emissions in response to variations in both their workload's intensity and their energy's carbon-intensity. Specifically, \carbonContainerS enable applications to specify a maximum carbon emissions rate (in gâ‹…\cdotCO2_2e/hr), and then transparently enforce this rate via a combination of vertical scaling, container migration, and suspend/resume while maximizing either energy-efficiency or performance. Carbon Containers are especially useful for applications that i) must continue running even during high-carbon periods, and ii) execute in regions with few variations in carbon-intensity. These low-variability regions also tend to have high average carbon-intensity, which increases the importance of regulating carbon emissions. We implement a Carbon Containers prototype by extending Linux Containers to incorporate the mechanisms above and evaluate it using real workload traces and carbon-intensity data from multiple regions. We compare Carbon Containers with prior work that regulates carbon emissions by suspending/resuming applications during high/low carbon periods. We show that Carbon Containers are more carbon-efficient and improve performance while maintaining similar carbon emissions.Comment: ACM Symposium on Cloud Computing (SoCC

    Using Workload Prediction and Federation to Increase Cloud Utilization

    Get PDF
    The wide-spread adoption of cloud computing has changed how large-scale computing infrastructure is built and managed. Infrastructure-as-a-Service (IaaS) clouds consolidate different separate workloads onto a shared platform and provide a consistent quality of service by overprovisioning capacity. This additional capacity, however, remains idle for extended periods of time and represents a drag on system efficiency.The smaller scale of private IaaS clouds compared to public clouds exacerbates overprovisioning inefficiencies as opportunities for workload consolidation in private clouds are limited. Federation and cycle harvesting capabilities from computational grids help to improve efficiency, but to date have seen only limited adoption in the cloud due to a fundamental mismatch between the usage models of grids and clouds. Computational grids provide high throughput of queued batch jobs on a best-effort basis and enforce user priorities through dynamic job preemption, while IaaS clouds provide immediate feedback to user requests and make ahead-of-time guarantees about resource availability.We present a novel method to enable workload federation across IaaS clouds that overcomes this mismatch between grid and cloud usage models and improves system efficiency while also offering availability guarantees. We develop a new method for faster-than-realtime simulation of IaaS clouds to make predictions about system utilization and leverage this method to estimate the future availability of preemptible resources in the cloud. We then use these estimates to perform careful admission control and provide ahead-of-time bounds on the preemption probability of federated jobs executing on preemptible resources. Finally, we build an end-to-end prototype that addresses practical issues of workload federation and evaluate the prototype's efficacy using real-world traces from big data and compute-intensive production workloads

    Application-Aware Resource Management for Cloud Platforms

    Get PDF
    Cloud computing has become increasingly popular in recent years. The benefits of cloud platforms include ease of application deployment, a pay-as-you-go model, and the ability to scale resources up or down based on an application\u27s workload. Today\u27s cloud platforms are being used to host increasingly complex distributed and parallel applications. The main premise of this thesis is that application-aware resource management techniques are better suited for distributed cloud applications over a systems-level one-size-fits-all approach. In this thesis, I study the cloud-based resource management techniques with a particular emphasis on how application-aware approaches can be used to improve system resource utilization and enhance applications\u27 performance and cost. I first study always-on interactive applications that run on transient cloud servers such as Amazon spot instances. I show that by combining techniques like nested virtualization, live migration and lazy restoration together with intelligent bidding strategies, it is feasible to provide high availability to such applications while significantly reducing cost. I next study how to improve performance of parallel data processing applications like Hadoop and Spark that run in the cloud. I argue that network I/O contention in Hadoop can impact application throughput and implement a collaborative application-aware network and task scheduler using software-defined networking. By combining flow scheduling with task scheduling, our system can effectively avoid network contention and improve Hadoop\u27s performance. I then investigate similar issues in Spark and find that task scheduling is more important for Spark jobs. I propose a network-aware task scheduling method that can adaptively schedule tasks for different types of jobs without system tuning and improve Spark\u27s performance significantly. Finally, I study how to deploy network functions in the cloud. Specifically, I focus on comparing different methods of chaining network functions. By carrying out empirical evaluation of two different deployment methods, we figure out the advantages and disadvantages of each method. Our results suggest that the tenant-centric placement provides lower latencies while service-centric approach is more flexible for reconfiguration and capacity scaling

    System Support for Managing Risk in Cloud Computing Platforms

    Get PDF
    Cloud platforms sell computing to applications for a price. However, by precisely defining and controlling the service-level characteristics of cloud servers, they expose applications to a number of implicit risks throughout the application’s lifecycle. For example, user’s request for a server may be denied, leading to rejection risk; an allocated resource may be withdrawn, resulting in revocation risk; an acquired cloud server’s price may rise relative to others, causing price risk; a cloud server’s performance may vary due to external factors, triggering valuation risk. Though these risks are implicit, the costs they bear on the applications are not. While some risks exist in all Infrastructure-as-a-Service offerings, they are most pronounced in an emerging category called transient cloud servers. Since transient servers are carved out of instantaneous idle cloud capacity, they exhibit two distinct features: (i) revocations that are intentional, frequent and come with advanced warning, and (ii) prices that are low in average but vary across time and location. Thus, despite enabling inexpensive access to at-scale computing, transient cloud servers expose applications to risks, the scale of which were unseen in the past platforms. Unfortunately, the current generation system software are not designed to handle these risks, which in turn results in inconsistent performances, unexpected failures, missed savings, and slower adoption. In this dissertation, we elevate risk management to a first-class system design principle. Our goal is to identify the risks, quantify their costs, and explicitly manage them for applications deployed on cloud platforms. Towards that goal, we adapt and extend concepts from finance and economics to propose a new system design approach called financializing cloud computing. By treating cloud resources as investments, and by quantifying the cost of their risks, financialization enables system software to manage the risk-reward trade-offs, explicitly and autonomously. We demonstrate the utility of our approach via four contributions: (i) mitigating revocation risk with insurance policy, (ii) reducing price risk through active trading, (iii) eliminating uncertainty risk by index tracking, and (iv) minimizing server’s valuation risk via asset pricing. We conclude by observing that diversity and asymmetry in the creation and consumption of cloud compute resources is on the rise, and that financialization can be effectively employed to manage its complexity and risks

    Performance Evaluation of Serverless Applications and Infrastructures

    Get PDF
    Context. Cloud computing has become the de facto standard for deploying modern web-based software systems, which makes its performance crucial to the efficient functioning of many applications. However, the unabated growth of established cloud services, such as Infrastructure-as-a-Service (IaaS), and the emergence of new serverless services, such as Function-as-a-Service (FaaS), has led to an unprecedented diversity of cloud services with different performance characteristics. Measuring these characteristics is difficult in dynamic cloud environments due to performance variability in large-scale distributed systems with limited observability.Objective. This thesis aims to enable reproducible performance evaluation of serverless applications and their underlying cloud infrastructure.Method. A combination of literature review and empirical research established a consolidated view on serverless applications and their performance. New solutions were developed through engineering research and used to conduct performance benchmarking field experiments in cloud environments.Findings. The review of 112 FaaS performance studies from academic and industrial sources found a strong focus on a single cloud platform using artificial micro-benchmarks and discovered that most studies do not follow reproducibility principles on cloud experimentation. Characterizing 89 serverless applications revealed that they are most commonly used for short-running tasks with low data volume and bursty workloads. A novel trace-based serverless application benchmark shows that external service calls often dominate the median end-to-end latency and cause long tail latency. The latency breakdown analysis further identifies performance challenges of serverless applications, such as long delays through asynchronous function triggers, substantial runtime initialization for coldstarts, increased performance variability under bursty workloads, and heavily provider-dependent performance characteristics. The evaluation of different cloud benchmarking methodologies has shown that only selected micro-benchmarks are suitable for estimating application performance, performance variability depends on the resource type, and batch testing on the same instance with repetitions should be used for reliable performance testing.Conclusions. The insights of this thesis can guide practitioners in building performance-optimized serverless applications and researchers in reproducibly evaluating cloud performance using suitable execution methodologies and different benchmark types

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Dynamic shared memory architecture, systems, and optimizations for high performance and secure virtualized cloud

    Get PDF
    Dynamic memory consolidation is an important enabler for high performance virtual machine (VM) execution in virtualized Cloud. Efficient just-in-time memory balancing requires three core capabilities: (i) Detecting memory pressure across VMs hosted on a physical machine; (ii) Allocation of memory to respective VMs; (iii) Enabling fast recovery upon making newly allocated memory available at the high pressure VMs. Although the Balloon driver technology facilitates the second task, it remains difficult to accurately predict the VM memory demands at affordable overhead, especially under unpredictable and changing workloads. Furthermore, no prior study analyzed the effect of slow response of VM execution to the newly available memory due to paging based application recovery. In this dissertation research, I have made four original contributions to dynamic shared memory management in terms of architecture, systems and optimizations for improving VM execution performance and security. First, we designed and developed MemPipe, a shared memory inter-VM communication channel for fast inter-VM network I/O. MemPipe increases the shared memory utilization by adaptively adjusting the shared memory size according to workloads demands. It also reduces the inter-VM network communication overhead by directly copying the packets from the sender VM's user space to the shared memory area. Second, we developed iBalloon, a light-weight and transparent prediction based facility to enable automated or semi-automated ballooning with more customizable, accurate, and efficient memory balancing policies among VMs. Third, we developed MemFlex, a novel shared memory swapping facility that can effectively utilizes host idle memory by a hybrid memory swap-out model and a fast swap-in optimization. Fourth, we introduced SecureStack, which is a kernel backed tool to prevent the sensitive data on the function stack from being illegally accessed by the untrusted functions. SecureStack introduces three procedures to protect, restore, and clear the stack in a reliable and low cost manner. It is highly transparent to the users and does not bring any new vulnerability to the existing system. The above research developments are packaged into MemLego, a new memory management framework for memory-centric computing in the big data era.Ph.D
    • …
    corecore