304 research outputs found

    Understanding and Improving the Performance of Read Operations Across the Storage Stack

    Get PDF
    We live in a data-driven era, large amounts of data are generated and collected every day. Storage systems are the backbone of this era, as they store and retrieve data. To cope with increasing data demands (e.g., diversity, scalability), storage systems are experiencing changes across the stack. As other computer systems, storage systems rely on layering and modularity, to allow rapid development. Unfortunately, this can hinder performance clarity and introduce degradations (e.g., tail latency), due to unexpected interactions between components of the stack. In this thesis, we first perform a study to understand the behavior across different layers of the storage stack. We focus on sequential read workloads, a common I/O pattern in distributed le systems (e.g., HDFS, GFS). We analyze the interaction between read workloads, local le systems (i.e., ext4), and storage media (i.e., SSDs). We perform the same experiment over different periods of time (e.g., le lifetime). We uncover 3 slowdowns, all of which occur in the lower layers. When combined, these slowdowns can degrade throughput by 30%. We find that increased parallelism on the local le system mitigates these slowdowns, showing the need for adaptability in storage stacks. Given the fact that performance instabilities can occur at any layer of the stack, it is important that upper-layer systems are able to react. We propose smart hedging, a novel technique to manage high-percentile (tail) latency variations in read operations. Smart hedging considers production challenges, such as massive scalability, heterogeneity, and ease of deployment and maintainability. Our technique establishes a dynamic threshold by tracking latencies on the client-side. If a read operation exceeds the threshold, a new hedged request is issued, in an exponential back-off manner. We implement our technique in HDFS and evaluate it on 70k servers in 3 datacenters. Our technique reduces average tail latency, without generating excessive system load

    Scalability Benchmarking of Cloud-Native Applications Applied to Event-Driven Microservices

    Get PDF
    Cloud-native applications constitute a recent trend for designing large-scale software systems. This thesis introduces the Theodolite benchmarking method, allowing researchers and practitioners to conduct empirical scalability evaluations of cloud-native applications, their frameworks, configurations, and deployments. The benchmarking method is applied to event-driven microservices, a specific type of cloud-native applications that employ distributed stream processing frameworks to scale with massive data volumes. Extensive experimental evaluations benchmark and compare the scalability of various stream processing frameworks under different configurations and deployments, including different public and private cloud environments. These experiments show that the presented benchmarking method provides statistically sound results in an adequate amount of time. In addition, three case studies demonstrate that the Theodolite benchmarking method can be applied to a wide range of applications beyond stream processing

    Service Provisioning in Edge-Cloud Continuum Emerging Applications for Mobile Devices

    Get PDF
    Disruptive applications for mobile devices can be enhanced by Edge computing facilities. In this context, Edge Computing (EC) is a proposed architecture to meet the mobility requirements imposed by these applications in a wide range of domains, such as the Internet of Things, Immersive Media, and Connected and Autonomous Vehicles. EC architecture aims to introduce computing capabilities in the path between the user and the Cloud to execute tasks closer to where they are consumed, thus mitigating issues related to latency, context awareness, and mobility support. In this survey, we describe which are the leading technologies to support the deployment of EC infrastructure. Thereafter, we discuss the applications that can take advantage of EC and how they were proposed in the literature. Finally, after examining enabling technologies and related applications, we identify some open challenges to fully achieve the potential of EC, and also research opportunities on upcoming paradigms for service provisioning. This survey is a guide to comprehend the recent advances on the provisioning of mobile applications, as well as foresee the expected next stages of evolution for these applications

    Trade-Offs Under Pressure: Heuristics and Observations Of Teams Resolving Internet Service Outages

    Get PDF
    The increasing complexity of software applications and architectures in Internet services challenge the reasoning of operators tasked with diagnosing and resolving outages and degradations as they arise. Although a growing body of literature focuses on how failures can be prevented through more robust and fault-tolerant design of these systems, a dearth of research explores the cognitive challenges engineers face when those preventative designs fail and they are left to think and react to scenarios that hadn’t been imagined. This study explores what heuristics or rules-of-thumb engineers employ when faced with an outage or degradation scenario in a business-critical Internet service. A case study approach was used, focusing on an actual outage of functionality during a high period of buying activity on a popular online marketplace. Heuristics and other tacit knowledge were identified, and provide a promising avenue for both training and future interface design opportunities. Three diagnostic heuristics were identified as being in use: a) initially look for correlation between the behaviour and any recent changes made in the software, b) upon finding no correlation with a software change, widen the search to any potential contributors imagined, and c) when choosing a diagnostic direction, reduce it by focusing on the one that most easily comes to mind, either because symptoms match those of a difficult-to-diagnose event in the past, or those of any recent events. A fourth heuristic is coordinative in nature: when making changes to software in an effort to mitigate the untoward effects or to resolve the issue completely, rely on peer review of the changes more than automated testing (if at all.

    A framework for the dynamic management of Peer-to-Peer overlays

    Get PDF
    Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation. At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an “insider‟s” view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively. To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed. Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance

    Datacenter Architectures for the Microservices Era

    Full text link
    Modern internet services are shifting away from single-binary, monolithic services into numerous loosely-coupled microservices that interact via Remote Procedure Calls (RPCs), to improve programmability, reliability, manageability, and scalability of cloud services. Computer system designers are faced with many new challenges with microservice-based architectures, as individual RPCs/tasks are only a few microseconds in most microservices. In this dissertation, I seek to address the most notable challenges that arise due to the dissimilarities of the modern microservice based and classic monolithic cloud services, and design novel server architectures and runtime systems that enable efficient execution of ”s-scale microservices on modern hardware. In the first part of my dissertation, I seek to address the problem of Killer Microseconds, which refers to ”s-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput ”s-scale microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of ”s-scale stalls. In chapter II, I propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average. In chapters III-IV, I comprehensively investigate the problem of tail latency in the context of microservices and address multiple aspects of it. First, in chapter III, I characterize the tail latency behavior of microservices and provide general guidelines for optimizing computer systems from a queuing perspective to minimize tail latency. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. Next, in chapter IV, I introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of the framework. Q-Zilla is composed of the ServerQueue Decoupled Size-Interval Task Assignment (SQD-SITA) scheduling algorithm and the Express-lane Simultaneous Multithreading (ESMT) microarchitecture, which together seek to address HoL blocking by providing an “express-lane” for short tasks, protecting them from queuing behind rare, long ones. By combining the ESMT microarchitecture and the SQD-SITA scheduling algorithm, CoreZilla is able to improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.38×, on average, respectively, and outperform a theoretical 32-core scale-up organization by 12%, on average, with 8 contexts. Finally, in chapters V-VI, I investigate the tail latency problem of microservices from a cluster, rather than server-level, perspective. Whereas Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction, with microservice-based applications, it is unclear how to scale individual microservices when end-to-end SLOs are violated or underutilized. I introduce Parslo as an analytical framework for partial SLO allocation in virtualized cloud microservices. Parslo takes a microservice graph as an input and employs a Gradient Descent-based approach to allocate “partial SLOs” to different microservice nodes, enabling independent auto-scaling of individual microservices. Parslo achieves the optimal solution, minimizing the total cost for the entire service deployment, and is applicable to general microservice graphs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167978/1/miramir_1.pd

    Taming the Big Green Elephant

    Get PDF
    In this open access publication it is shown, that sustainable low carbon development is a transformative process that constitutes the shifting from the initially chosen or taken pathway to another pathway as goals have been re-visited and revised to enable the system to adapt to changes. However, shifting entails transition costs that are accrued through the effects of lock-ins that have framed decisions and collective actions. The uncertainty about these costs can be overwhelming or even disruptive. This book aims to provide a comprehensive and integrated analytical framework that promotes the understanding of transformation towards sustainability. The analysis of this book is built upon negotiative perspectives to help define, design, and facilitate collective actions in order to execute the principles of sustainability

    Real-time probabilistic reasoning system using Lambda architecture

    Get PDF
    Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2019The proliferation of data from sources like social media, and sensor devices has become overwhelming for traditional data storage and analysis technologies to handle. This has prompted a radical improvement in data management techniques, tools and technologies to meet the increasing demand for effective collection, storage and curation of large data set. Most of the technologies are open-source. Big data is usually described as very large dataset. However, a major feature of big data is its velocity. Data flow in as continuous stream and require to be actioned in real-time to enable meaningful, relevant value. Although there is an explosion of technologies to handle big data, they are usually targeted at processing large dataset (historic) and real-time big data independently. Thus, the need for a unified framework to handle high volume dataset and real-time big data. This resulted in the development of models such as the Lambda architecture. Effective decision-making requires processing of historic data as well as real-time data. Some decision-making involves complex processes, depending on the likelihood of events. To handle uncertainty, probabilistic systems were designed. Probabilistic systems use probabilistic models developed with probability theories such as hidden Markov models with inference algorithms to process data and produce probabilistic scores. However, development of these models requires extensive knowledge of statistics and machine learning, making it an uphill task to model real-life circumstances. A new research area called probabilistic programming has been introduced to alleviate this bottleneck. This research proposes the combination of modern open-source big data technologies with probabilistic programming and Lambda architecture on easy-to-get hardware to develop a highly fault-tolerant, and scalable processing tool to process both historic and real-time big data in real-time; a common solution. This system will empower decision makers with the capacity to make better informed resolutions especially in the face of uncertainty. The outcome of this research will be a technology product, built and assessed using experimental evaluation methods. This research will utilize the Design Science Research (DSR) methodology as it describes guidelines for the effective and rigorous construction and evaluation of an artefact. Probabilistic programming in the big data domain is still at its infancy, however, the developed artefact demonstrated an important potential of probabilistic programming combined with Lambda architecture in the processing of big data
    • 

    corecore