96 research outputs found

    Service Boosters: Library Operating Systems For The Datacenter

    Get PDF
    Cloud applications are taking an increasingly important place our technology and economic landscape. Consequently, they are subject to stringent performance requirements. High tail latency — percentiles at the tail of the response time distribution — is a threat to these requirements. As little as 0.01% slow requests in one microservice can significantly degrade performance for the entire application. The conventional wisdom is that application-awareness is crucial to design optimized performance management systems, but comes at the cost of maneuverability. Consequently, existing execution environments are often general-purpose and ignore important application features such as the architecture of request processing pipelines or the type of requests being served. These one-size-fits-all solutions are missing crucial information to identify and remove sources of high tail latency. This thesis aims to develop a lightweight execution environment exploiting application semantics to optimize tail performance for cloud services. This system, dubbed Service Boosters, is a library operating system exposing application structure and semantics to the underlying resource management stack. Using Service Boosters, programmers use a generic programming model to build, declare and an-notate their request processing pipeline, while performance engineers can program advanced management strategies. Using Service Boosters, I present three systems, FineLame, Perséphone, and DeDoS, that exploit application awareness to provide real time anomaly detection; tail-tolerant RPC scheduling; and resource harvesting. FineLame leverages awareness of the request processing pipeline to deploy monitoring and anomaly detection probes. Using these, FineLame can detect abnormal requests in-flight whenever they depart from the expected behavior and alerts other resource management modules. Pers ́ephone exploits an understanding of request types to dynamically allocate resources to each type and forbid pathological head-of-line blocking from heavy-tailed workloads, without the need for interrupts. Pers ́ephone is a low overhead solution well suited for microsecond scale workloads. Finally, DeDoS can identify overloaded components and dynamically scale them, harvesting only the resources needed to quench the overload. Service Boosters is a powerful framework to handle tail latency in the datacenter. Service Boosters clearly separates the roles of application development and performance engineering, proposing a general purpose application programming model while enabling the development of specialized resource management modules such as Perséphone and DeDoS

    A cross-stack, network-centric architectural design for next-generation datacenters

    Get PDF
    This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity. This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments

    Cloud-scale VM Deflation for Running Interactive Applications On Transient Servers

    Full text link
    Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server preemption for reclaiming resources from transient cloud servers under resource pressure. Using real traces from top-tier cloud providers, we show the feasibility of using VM deflation as a resource reclamation mechanism for interactive applications in public clouds. We show how current hypervisor mechanisms can be used to implement VM deflation and present cluster deflation policies for resource management of transient and on-demand cloud VMs. Experimental evaluation of our deflation system on a Linux cluster shows that microservice-based applications can be deflated by up to 50\% with negligible performance overhead. Our cluster-level deflation policies allow overcommitment levels as high as 50\%, with less than a 1\% decrease in application throughput, and can enable cloud platforms to increase revenue by 30\%.Comment: To appear at ACM HPDC 202

    An Experimental Evaluation of Datacenter Workloads On Low-Power Embedded Micro Servers

    Get PDF
    This paper presents a comprehensive evaluation of an ultra-low power cluster, built upon the Intel Edison based micro servers. The improved performance and high energy efficiency of micro servers have driven both academia and industry to explore the possibility of replacing conventional brawny servers with a larger swarm of embedded micro servers. Existing attempts mostly focus on mobile-class micro servers, whose capacities are similar to mobile phones. We, on the other hand, target on sensor-class micro servers, which are originally intended for uses in wearable technologies, sensor networks, and Internet-of-Things. Although sensor-class micro servers have much less capacity, they are touted for minimal power consumption (< 1 Watt), which opens new possibilities of achieving higher energy efficiency in datacenter workloads. Our systematic evaluation of the Edison cluster and comparisons to conventional brawny clusters involve careful workload choosing and laborious parameter tuning, which ensures maximum server utilization and thus fair comparisons. Results show that the Edison cluster achieves up to 3.5× improvement on work-done-per-joule for web service applications and data-intensive MapReduce jobs. In terms of scalability, the Edison cluster scales linearly on the throughput of web service workloads, and also shows satisfactory scalability for MapReduce workloads despite coordination overhead.This research was supported in part by NSF grant 13-20209.Ope

    Improving the performance of Virtualized Network Services based on NFV and SDN

    Get PDF
    Network Functions Virtualisation (NFV) proposes to move all the traditional network appliances, which require dedicated physical machine, onto virtualised environment (e.g,. Virtual Machine). In this way, many of the current physical devices present in the infrastructure are replaced with standard high volume servers, which could be located in Datacenters, at the edge of the network and in the end user premises. This enables a reduction of the required physical resources thanks to the use of virtualization technologies, already used in cloud computing, and allows services to be more dynamic and scalable. However, differently from traditional cloud applications which are rather demanding in terms of CPU power, network applications are mostly I/O bound, hence the virtualization technologies in use (either standard VM-based or lightweight ones) need to be improved to maximize the network performance. A series of Virtual Network Functions (VNFs) can be connected to each other thanks to Software-Defined Networks (SDN) technologies (e.g., OpenFlow) to create a Network Function Forwarding Graph (NF-FG) that processes the network traffic in the configured order of the graph. Using NF-FGs it is possible to create arbitrary chains of services, and transparently configure different virtualized network services, which can be dynamically instantiated and rearranges depending on the requested service and its requirements. However, the above virtualized technologies are rather demanding in terms of hardware resources (mainly CPU and memory), which may have a non-negligible impact on the cost of providing the services according to this paradigm. This thesis will investigate this problem, proposing a set of solutions that enable the novel NFV paradigm to be efficiently used, hence being able to guarantee both flexibility and efficiency in future network services

    Hardening High-Assurance Security Systems with Trusted Computing

    Get PDF
    We are living in the time of the digital revolution in which the world we know changes beyond recognition every decade. The positive aspect is that these changes also drive the progress in quality and availability of digital assets crucial for our societies. To name a few examples, these are broadly available communication channels allowing quick exchange of knowledge over long distances, systems controlling automatic share and distribution of renewable energy in international power grid networks, easily accessible applications for early disease detection enabling self-examination without burdening the health service, or governmental systems assisting citizens to settle official matters without leaving their homes. Unfortunately, however, digitalization also opens opportunities for malicious actors to threaten our societies if they gain control over these assets after successfully exploiting vulnerabilities in the complex computing systems building them. Protecting these systems, which are called high-assurance security systems, is therefore of utmost importance. For decades, humanity has struggled to find methods to protect high-assurance security systems. The advancements in the computing systems security domain led to the popularization of hardware-assisted security techniques, nowadays available in commodity computers, that opened perspectives for building more sophisticated defense mechanisms at lower costs. However, none of these techniques is a silver bullet. Each one targets particular use cases, suffers from limitations, and is vulnerable to specific attacks. I argue that some of these techniques are synergistic and help overcome limitations and mitigate specific attacks when used together. My reasoning is supported by regulations that legally bind high-assurance security systems' owners to provide strong security guarantees. These requirements can be fulfilled with the help of diverse technologies that have been standardized in the last years. In this thesis, I introduce new techniques for hardening high-assurance security systems that execute in remote execution environments, such as public and hybrid clouds. I implemented these techniques as part of a framework that provides technical assurance that high-assurance security systems execute in a specific data center, on top of a trustworthy operating system, in a virtual machine controlled by a trustworthy hypervisor or in strong isolation from other software. I demonstrated the practicality of my approach by leveraging the framework to harden real-world applications, such as machine learning applications in the eHealth domain. The evaluation shows that the framework is practical. It induces low performance overhead (<6%), supports software updates, requires no changes to the legacy application's source code, and can be tailored to individual trust boundaries with the help of security policies. The framework consists of a decentralized monitoring system that offers better scalability than traditional centralized monitoring systems. Each monitored machine runs a piece of code that verifies that the machine's integrity and geolocation conform to the given security policy. This piece of code, which serves as a trusted anchor on that machine, executes inside the trusted execution environment, i.e., Intel SGX, to protect itself from the untrusted host, and uses trusted computing techniques, such as trusted platform module, secure boot, and integrity measurement architecture, to attest to the load-time and runtime integrity of the surrounding operating system running on a bare metal machine or inside a virtual machine. The trusted anchor implements my novel, formally proven protocol, enabling detection of the TPM cuckoo attack. The framework also implements a key distribution protocol that, depending on the individual security requirements, shares cryptographic keys only with high-assurance security systems executing in the predefined security settings, i.e., inside the trusted execution environments or inside the integrity-enforced operating system. Such an approach is particularly appealing in the context of machine learning systems where some algorithms, like the machine learning model training, require temporal access to large computing power. These algorithms can execute inside a dedicated, trusted data center at higher performance because they are not limited by security features required in the shared execution environment. The evaluation of the framework showed that training of a machine learning model using real-world datasets achieved 0.96x native performance execution on the GPU and a speedup of up to 1560x compared to the state-of-the-art SGX-based system. Finally, I tackled the problem of software updates, which makes the operating system's integrity monitoring unreliable due to false positives, i.e., software updates move the updated system to an unknown (untrusted) state that is reported as an integrity violation. I solved this problem by introducing a proxy to a software repository that sanitizes software packages so that they can be safely installed. The sanitization consists of predicting and certifying the future (after the specific updates are installed) operating system's state. The evaluation of this approach showed that it supports 99.76% of the packages available in Alpine Linux main and community repositories. The framework proposed in this thesis is a step forward in verifying and enforcing that high-assurance security systems execute in an environment compliant with regulations. I anticipate that the framework might be further integrated with industry-standard security information and event management tools as well as other security monitoring mechanisms to provide a comprehensive solution hardening high-assurance security systems

    Doctor of Philosophy

    Get PDF
    dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications
    corecore