6 research outputs found

    Client-side Flash Caching for Cloud Systems

    Full text link

    Interpretability of AI in Computer Systems and Public Policy

    Get PDF
    Advances in Artificial Intelligence (AI) have led to spectacular innovations and sophisticated systems for tasks that were thought to be capable only by humans. Examples include playing chess and Go, face and voice recognition, driving vehicles, and more. In recent years, the impact of AI has moved beyond offering mere predictive models into building interpretable models that appeal to human logic and intuition because they ensure transparency and simplicity and can be used to make meaningful decisions in real-world applications. A second trend in AI is characterized by important advancements in the realm of causal reasoning. Identifying causal relationships is an important aspect of scientific endeavors in a variety of fields. Causal models and Bayesian inference can help us gain better domain-specific insight and make better data-driven decisions because of their interpretability. The main objective of this dissertation was to adapt theoretically sound AI-based interpretable data-analytic approaches to solve domain-specific problems in the two un-related fields of Storage Systems and Public Policy. For the first task, we considered the well-studied problem of cache replacement problem in computing systems, which can be modeled as a variant of the well-known Multi-Armed Bandit (MAB) problem with delayed feedback and decaying costs, and developed an algorithm called EXP4-DFDC. We proved theoretically that EXP4-DFDC exhibits an important feature called vanishing regret. Based on the theoretical analysis, we designed a machine-learning algorithm called ALeCaR, with adaptive hyperparameters. We used extensive experiments on a wide range of workloads to show that ALeCaR performed better than LeCaR, the best machine learning algorithm for cache replacement at that time. We concluded that reinforcement machine learning can offer an outstanding approach for implementing cache management policies. For the second task, we used Bayesian networks to analyze the service request data from three 311 centers providing non-emergency services in the cities of Miami-Dade, New York City, and San Francisco. Using a causal inference approach, this study investigated the presence of inequities in the quality of the 311 services to neighborhoods with varying demographics and socioeconomic status. We concluded that the services provided by the local governments showed no detectable biases on the basis of race, ethnicity, or socioeconomic status

    Flash Caching for Cloud Computing Systems

    Get PDF
    As the size of cloud systems and the number of hosted virtual machines (VMs) rapidly grow, the scalability of shared VM storage systems becomes a serious issue. Client-side flash-based caching has the potential to improve the performance of cloud VM storage by employing flash storage available on the VM hosts to exploit the locality inherent in VM IOs. However, there are several challenges to the effective use of flash caching in cloud systems. First, cache configurations such as size, write policy, metadata persistency and RAID level have significant impacts on flash caching. Second, the typical capacity of flash devices is limited compared to the dataset size of consolidated VMs. Finally, flash devices wear out and face serious endurance issues which are aggravated by the use for caching. This dissertation presents the research for addressing these problems of cloud flash caching in the following three aspects. First, it presents a thorough study of different cache configurations including a new cache-optimized RAID configuration using a large amount of long-term traces collected from real-world public and private clouds. Second, it studies an on-demand flash cache management solution for meeting VM cache demands and minimizing device wear-out. It uses a new cache demand model Reuse Working Set (RWS) to capture the data with good temporal locality, and uses the RWS size (RWSS) to model a workload?s cache demand. Finally, to handle situations where a cache is insufficient for VMs? demands, it employs dynamic cache migration to balance cache load across hosts by live migrating cached data along with the VMs. The results show that the cache-optimized RAID improves performance by 137% without sacrificing reliability, compared to traditional RAID. The RWSS-based on-demand cache allocation reduces workload?s cache usage by 78% and lowers the amount of writes sent to cache device by 40%, compared to traditional working set based cache allocation. Combining on-demand cache allocation with dynamic cache migration for 12 concurrent VMs, results show 28% higher hit ratio and 28% lower 90th percentile IO latency, compared to the case without cache allocation

    On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective

    Get PDF
    Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage. In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer

    Datacenter Architectures for the Microservices Era

    Full text link
    Modern internet services are shifting away from single-binary, monolithic services into numerous loosely-coupled microservices that interact via Remote Procedure Calls (RPCs), to improve programmability, reliability, manageability, and scalability of cloud services. Computer system designers are faced with many new challenges with microservice-based architectures, as individual RPCs/tasks are only a few microseconds in most microservices. In this dissertation, I seek to address the most notable challenges that arise due to the dissimilarities of the modern microservice based and classic monolithic cloud services, and design novel server architectures and runtime systems that enable efficient execution of µs-scale microservices on modern hardware. In the first part of my dissertation, I seek to address the problem of Killer Microseconds, which refers to µs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput µs-scale microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of µs-scale stalls. In chapter II, I propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average. In chapters III-IV, I comprehensively investigate the problem of tail latency in the context of microservices and address multiple aspects of it. First, in chapter III, I characterize the tail latency behavior of microservices and provide general guidelines for optimizing computer systems from a queuing perspective to minimize tail latency. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. Next, in chapter IV, I introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of the framework. Q-Zilla is composed of the ServerQueue Decoupled Size-Interval Task Assignment (SQD-SITA) scheduling algorithm and the Express-lane Simultaneous Multithreading (ESMT) microarchitecture, which together seek to address HoL blocking by providing an “express-lane” for short tasks, protecting them from queuing behind rare, long ones. By combining the ESMT microarchitecture and the SQD-SITA scheduling algorithm, CoreZilla is able to improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.38×, on average, respectively, and outperform a theoretical 32-core scale-up organization by 12%, on average, with 8 contexts. Finally, in chapters V-VI, I investigate the tail latency problem of microservices from a cluster, rather than server-level, perspective. Whereas Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction, with microservice-based applications, it is unclear how to scale individual microservices when end-to-end SLOs are violated or underutilized. I introduce Parslo as an analytical framework for partial SLO allocation in virtualized cloud microservices. Parslo takes a microservice graph as an input and employs a Gradient Descent-based approach to allocate “partial SLOs” to different microservice nodes, enabling independent auto-scaling of individual microservices. Parslo achieves the optimal solution, minimizing the total cost for the entire service deployment, and is applicable to general microservice graphs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167978/1/miramir_1.pd
    corecore