25 research outputs found

    Enabling Hyperscale Web Services

    Full text link
    Modern web services such as social media, online messaging, web search, video streaming, and online banking often support billions of users, requiring data centers that scale to hundreds of thousands of servers, i.e., hyperscale. In fact, the world continues to expect hyperscale computing to drive more futuristic applications such as virtual reality, self-driving cars, conversational AI, and the Internet of Things. This dissertation presents technologies that will enable tomorrow’s web services to meet the world’s expectations. The key challenge in enabling hyperscale web services arises from two important trends. First, over the past few years, there has been a radical shift in hyperscale computing due to an unprecedented growth in data, users, and web service software functionality. Second, modern hardware can no longer support this growth in hyperscale trends due to a decline in hardware performance scaling. To enable this new hyperscale era, hardware architects must become more aware of hyperscale software needs and software researchers can no longer expect unlimited hardware performance scaling. In short, systems researchers can no longer follow the traditional approach of building each layer of the systems stack separately. Instead, they must rethink the synergy between the software and hardware worlds from the ground up. This dissertation establishes such a synergy to enable futuristic hyperscale web services. This dissertation bridges the software and hardware worlds, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. The specific goal is to design software that is aware of new hardware constraints and architect hardware that efficiently supports new hyperscale software requirements. This dissertation spans two broad thrusts: (1) a software and (2) a hardware thrust to analyze the complex hyperscale design space and use insights from these analyses to design efficient cross-stack solutions for hyperscale computation. In the software thrust, this dissertation contributes uSuite, the first open-source benchmark suite of web services built with a new hyperscale software paradigm, that is used in academia and industry to study hyperscale behaviors. Next, this dissertation uses uSuite to study software threading implications in light of today’s hardware reality, identifying new insights in the age-old research area of software threading. Driven by these insights, this dissertation demonstrates how threading models must be redesigned at hyperscale by presenting an automated approach and tool, uTune, that makes intelligent run-time threading decisions. In the hardware thrust, this dissertation architects both commodity and custom hardware to efficiently support hyperscale software requirements. First, this dissertation characterizes commodity hardware’s shortcomings, revealing insights that influenced commercial CPU designs. Based on these insights, this dissertation presents an approach and tool, SoftSKU, that enables cheap commodity hardware to efficiently support new hyperscale software paradigms, improving the efficiency of real-world web services that serve billions of users, saving millions of dollars, and meaningfully reducing the global carbon footprint. This dissertation also presents a hardware-software co-design, uNotify, that redesigns commodity hardware with minimal modifications by using existing hardware mechanisms more intelligently to overcome new hyperscale overheads. Next, this dissertation characterizes how custom hardware must be designed at hyperscale, resulting in industry-academia benchmarking efforts, commercial hardware changes, and improved software development. Based on this characterization’s insights, this dissertation presents Accelerometer, an analytical model that estimates gains from hardware customization. Multiple hyperscale enterprises and hardware vendors use Accelerometer to make well-informed hardware decisions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169802/1/akshitha_1.pd

    Execution Environments for Running Legacy Applications in Multi-Party Trust Settings

    Get PDF
    Applications often assume that the same party owns all of the application’s resources, and that these resources require the same level of privacy. This assumption no longer holds when organizations outsource applications to a third-party cloud, or when the application requires access to not only public content, but private configuration, such as authentication and keying material. The result of this broken assumption is that applications either must be re-written to accommodate each new security posture, or used as-is, accepting that one party exposes private data to another. In this dissertation, I argue the following thesis: it is possible to run legacy application binaries with confidentiality and integrity guarantees that reflect a multi-party trust setting. I support this thesis through the design, implementation, and evaluation of two distinct application-level virtualization layers that handle trust concerns on behalf of the application: conclaves and SecureMigration. Conclaves assume the availability of Intel SGX secure hardware enclaves and extend prior work in developing runtimes that execute legacy applications within an enclave. In contrast, SecureMigration does not use secure hardware, but rather composes information flow control with process migration to execute a process across multiple physical machines owned and operated by distinct principals, while shielding each principal’s sensitive portion of the process from its peers

    High Performance Web Servers: A Study In Concurrent Programming Models

    Get PDF
    With the advent of commodity large-scale multi-core computers, the performance of software running on these computers has become a challenge to researchers and enterprise developers. While academic research and industrial products have moved in the direction of writing scalable and highly available services using distributed computing, single machine performance remains an active domain, one which is far from saturated. This thesis selects an archetypal software example and workload in this domain, and describes software characteristics affecting performance. The example is highly-parallel web-servers processing a static workload. Particularly, this work examines concurrent programming models in the context of high-performance web-servers across different architectures — threaded (Apache, Go and μKnot), event-driven (Nginx, μServer) and staged (WatPipe) — compared with two static workloads in two different domains. The two workloads are a Zipf distribution of file sizes representing a user session pulling an assortment of many small and a few large files, and a 50KB file representing chunked streaming of a large audio or video file. Significant effort is made to fairly compare eight web-servers by carefully tuning each via their adjustment parameters. Tuning plays a significant role in workload-specific performance. The two domains are no disk I/O (in-memory file set) and medium disk I/O. The domains are created by lowering the amount of RAM available to the web-server from 4GB to 2GB, forcing files to be evicted from the file-system cache. Both domains are also restricted to 4 CPUs. The primary goal of this thesis is to examine fundamental performance differences between threaded and event-driven concurrency models, with particular emphasis on user-level threading models. Additionally, a secondary goal of the work is to examine high-performance software under restricted hardware environments. Over-provisioned hardware environments can mask architectural and implementation shortcomings in software – the hypothesis in this work is that restricting resources stresses the application, bringing out important performance characteristics and properties. Experimental results for the given workload show that memory pressure is one of the most significant factors for the degradation of web-server performance, because it forces both the onset and amount of disk I/O. With an ever increasing need to support more content at faster rates, a web-server relies heavily on in-memory caching of files and related content. In fact, personal and small business web-servers are even run on minimal hardware, like the Raspberry Pi, with only 1GB of RAM and a small SD card for the file system. Therefore, understanding behaviour and performance in restricted contexts should be a normal aspect of testing a web server (and other software systems)

    Event-driven servers using asynchronous, non-blocking network I/O: Performance evaluation of kqueue and epoll

    Get PDF
    This research project evaluates the performance of kqueue and epoll in the context of event-driven servers. The evaluation is done through benchmarking and tracing which are used to measure throughput and execution time respectively. The experiment is repeated for both a virtualised and native server environment. The results from the experiment are statistically analysed and compared. These results show significant differences between kqueue and epoll, and a profound impact of virtualisation as a variable

    Direct Inter-Process Communication (dIPC): Repurposing the CODOMs architecture to accelerate IPC

    Get PDF
    In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.We thank Diego Marr´on for helping with MariaDB, the anonymous reviewers for their feedback and, especially, Andrew Baumann for helping us improve the paper. This research was partially funded by HiPEAC through a collaboration grant for Lluís Vilanova (agreement number 687698 for the EU’s Horizon2020 research and innovation programme), the Israel Science Fundation (ISF grant 769/12) and the Israeli Ministry of Science, Technology and Space.Peer ReviewedPostprint (author's final draft

    Universal Skeptic Binder-Droid - Towards Arresting Malicious Communication of Colluding Apps in Android

    Get PDF
    Since its first release, Android has been increasingly adopted by people and companies worldwide. It is currently estimated that around 1.1 billion Android devices are in use. Even though Android was built with Security principles and comes with a sound security model, it is a favorite target for malware authors. McAfee observed a 76% year on year growth in Android malware during the year 2014 alone. Thus malware is a predominant threat in Android ecosystem. A common attack vector for Android malware is the use of colluding apps. Colluding apps involve two or more applications and operate in two phases. In Phase 1, one application steals private sensitive data of the user In Phase 2, the same application sends the data to another application via covert communication channels. There are several covert channels in Android frameworks. Until now, several solutions in the literature have focused on preventing the extraction of sensitive data from the phone. We, to the best of our knowledge, are the first to stop the flow of sensitive info via the covert channels. We propose, Universal Skeptic Binder-Droid, an enhanced Binder module which enforces policies regarding the use of communication channels and prevents apps from colluding. With our proposed system, we have the added advantage of dynamically configuring policies at run time. Our initial implementation and results on our test bed reflect on the effectiveness and the ease of use of such a system

    Automating Seccomp Filter Generation for Linux Applications

    Get PDF
    Software vulnerabilities in applications undermine the security of applications. By blocking unused functionality, the impact of potential exploits can be reduced. While seccomp provides a solution for filtering syscalls, it requires manual implementation of filter rules for each individual application. Recent work has investigated automated approaches for detecting and installing the necessary filter rules. However, as we show, these approaches make assumptions that are not necessary or require overly time-consuming analysis. In this paper, we propose Chestnut, an automated approach for generating strict syscall filters for Linux userspace applications with lower requirements and limitations. Chestnut comprises two phases, with the first phase consisting of two static components, i.e., a compiler and a binary analyzer, that extract the used syscalls during compilation or in an analysis of the binary. The compiler-based approach of Chestnut is up to factor 73 faster than previous approaches without affecting the accuracy adversely. On the binary analysis level, we demonstrate that the requirement of position-independent binaries of related work is not needed, enlarging the set of applications for which Chestnut is usable. In an optional second phase, Chestnut provides a dynamic refinement tool that allows restricting the set of allowed syscalls further. We demonstrate that Chestnut on average blocks 302 syscalls (86.5%) via the compiler and 288 (82.5%) using the binary-level analysis on a set of 18 widely used applications. We found that Chestnut blocks the dangerous exec syscall in 50% and 77.7% of the tested applications using the compiler- and binary-based approach, respectively. For the tested applications, Chestnut prevents exploitation of more than 62% of the 175 CVEs that target the kernel via syscalls. Finally, we perform a 6 month long-term study of a sandboxed Nginx server

    Analyse de maliciels sur Android par l'analyse de la mémoire vive

    Get PDF
    Les plateformes mobiles font partie intégrante du quotidien. Leur flexibilité a permis aux développeurs d’applications d’y proposer des applications de toutes sortes : productivité, jeux, messageries, etc. Devenues des outils connectés d’agrégation d’informations personnelles et professionnelles, ces plateformes sont perçues comme un écosystème lucratif par les concepteurs de maliciels. Android est un système d’exploitation libre de Google visant le marché des appareils mobiles et est l’une des cibles de ces attaques, en partie grâce à la popularité de celuici. Dans la mesure où les maliciels Android constituent une menace pour les consommateurs, il est essentiel que la recherche visant l’analyse de maliciels s’intéresse spécifiquement à cette plateforme mobile. Le travail réalisé dans le cadre de cette maîtrise s’est intéressé à cette problématique, et plus spécifiquement par l’analyse de la mémoire vive. À cette fin, il a fallu s’intéresser aux tendances actuelles en matière de maliciels sur Android et les approches d’analyses statiques et dynamiques présentes dans la littérature. Il a été, par la suite, proposé d’explorer l’analyse de la mémoire vive appliquée à l’analyse de maliciels comme un complément aux approches actuelles. Afin de démontrer l’intérêt de l’approche pour la plateforme Android, une étude de cas a été réalisée où un maliciel expérimental a été conçu pour exprimer les comportements malicieux problématiques pour la plupart des approches relevées dans la littérature. Une approche appelée l’analyse différentielle de la mémoire vive a été présentée afin de faciliter l’analyse. Cette approche utilise le résultat de la différence entre les éléments présents après et avant le déploiement du maliciel pour réduire la quantité d’éléments à analyser. Les résultats de cette étude ont permis de démontrer que l’approche est prometteuse en tant que complément aux approches actuelles. Il est recommandé qu’elle soit le sujet d’études subséquentes afin de mieux détecter les maliciels sur Android et d’en automatiser son application.Mobile devices are at the core of modern society. Their versatility has allowed third-party developers to generate a rich experience for the user through mobile apps of all types (e.g. productivity, games, communications). As mobile platforms have become connected devices that gather nearly all of our personal and professional information, they are seen as a lucrative market by malware developers. Android is an open-sourced operating system from Google targeting specifically the mobile market and has been targeted by malicious activity due the widespread adoption of the latter by the consumers. As Android malwares threaten many consumers, it is essential that research in malware analysis address specifically this mobile platform. The work conducted during this Master’s focuses on the analysis of malwares on the Android platform. This was achieved through a literature review of the current malware trends and the approaches in static and dynamic analysis that exists to mitigate them. It was also proposed to explore live memory forensics applied to the analysis of malwares as a complement to existing methods. To demonstrate the applicability of the approach and its relevance to the Android malwares, a case study was proposed where an experimental malware has been designed to express malicious behaviours difficult to detect through current methods. The approach explored is called differential live memory analysis. It consists of analyzing the difference in the content of the live memory before and after the deployment of the malware. The results of the study have shown that this approach is promising and should be explored in future studies as a complement to current approaches
    corecore