187 research outputs found

    Tuning and Predicting Consistency in Distributed Storage Systems

    Get PDF
    Distributed storage systems are constrained by the finite speed of propagation of information. The CAP (which stands for consistency, availability, and partition tolerance) theorem states that in the presence of network partitions, a choice has to be made in between availability and consistency. However, even in the absence of failures, a trade-off between consistency and latency of operations (reads and writes) exists. Eventually consistent storage systems often sacrifice consistency for high availability and low latencies. One way to achieve fine-tuning in the consistency-latency trade-off space is to inject artificial delays to each storage operation. This thesis describes an adaptive tuning framework that is able to calculate the values of artificial delay to be injected to each storage operation to meet a specific target consistency. The framework is able to adapt nimbly to environmental changes in the storage system to maintain target consistency levels. It consists of a feedback loop which uses a technique called spectral shifting at each iteration to calculate the target value of artificial delay from a history of operations. The tuning framework is able to converge to the target value of artificial delay much faster than the state-of-art solution. This thesis also presents a probabilistic analysis of inconsistencies in eventually consistent distributed storage systems operating under weak (read one, write one) consistency settings. The analysis takes into account symmetrical (same for reads and writes) artificial delays which enable consistency-latency tuning. A mathematical formula for the percentage of inconsistent operations is derived from other environmental parameters pertaining to the storage system. The formula's predictions for the proportion of inconsistent operations match observations of the same from a stochastic simulator of the storage system running 10^6 operations (per experiment), and from a widely used key-value store (Apache Cassandra) closely

    Latency Based Approach for Characterization of Cloud Application Performance.

    Get PDF
    PhDPublic cloud infrastructures provide exible hosting for web application providers, but the rented virtual machines (VMs) often offer unpredictable performance to the deployed applications. Understanding cloud performance is challenging for application providers, as clouds provide limited information that would help them have expectations about their application performance. In this thesis I present a technique to measure the performance of cloud applications, based on observations of the application latency. I treat the cloud application as a black box, making no assumption about the underlying platform. From my measurements, I can observe the varying performance provided by the different VM profles across well-known commercial cloud platforms. I also identify a trade-of between the responsiveness and the load of the measured servers, which can help application providers in their deployment and provisioning

    A Holistic Approach to Lowering Latency in Geo-distributed Web Applications

    Get PDF
    User perceived end-to-end latency of web applications have a huge impact on the revenue for many businesses. The end-to-end latency of web applications is impacted by: (i) User to Application server (front-end) latency which includes downloading and parsing web pages, retrieving further objects requested by javascript executions; and (ii) Application and storage server(back-end) latency which includes retrieving meta-data required for an initial rendering, and subsequent content based on user actions. Improving the user-perceived performance of web applications is challenging, given their complex operating environments involving user-facing web servers, content distribution network (CDN) servers, multi-tiered application servers, and storage servers. Further, the application and storage servers are often deployed on multi-tenant cloud platforms that show high performance variability. While many novel approaches like SPDY and geo-replicated datastores have been developed to improve their performance, many of these solutions are specific to certain layers, and may have different impact on user-perceived performance. The primary goal of this thesis is to address the above challenges in a holistic manner, focusing specifically on improving the end-to-end latency of geo-distributed multi-tiered web applications. This thesis makes the following contributions: (i) First, it reduces user-facing latency by helping CDNs identify and map objects that are more critical for page-load latency to the faster CDN cache layers. Through controlled experiments on real-world web pages, we show the potential of our approach to reduce hundreds of milliseconds in latency without affecting overall CDN miss rates. (ii) Next, it reduces back-end latency by optimally adapting the datastore replication policies (including number and location of replicas) to the heterogeneity in workloads. We show the benefits of our replication models using real-world traces of Twitter, Wikipedia and Gowalla on a 8 datacenter Cassandra cluster deployed on EC2. (iii) Finally, it makes multi-tier applications resilient to the inherent performance variability in the cloud through fine-grained request redirection. We highlight the benefits of our approach by deploying three real-world applications on commercial cloud platforms

    A novel causally consistent replication protocol with partial geo-replication

    Get PDF
    Distributed storage systems are a fundamental component of large-scale Internet services. To keep up with the increasing expectations of users regarding availability and latency, the design of data storage systems has evolved to achieve these properties, by exploiting techniques such as partial replication, geo-replication and weaker consistency models. While systems with these characteristics exist, they usually do not provide all these properties or do so in an inefficient manner, not taking full advantage of them. Additionally, weak consistency models, such as eventual consistency, put an excessively high burden on application programmers for writing correct applications, and hence, multiple systems have moved towards providing additional consistency guarantees such as implementing the causal (and causal+) consistency models. In this thesis we approach the existing challenges in designing a causally consistent replication protocol, with a focus on the use of geo and partial data replication. To this end, we present a novel replication protocol, capable of enriching an existing geo and partially replicated datastore with the causal+ consistency model. In addition, this thesis also presents a concrete implementation of the proposed protocol over the popular Cassandra datastore system. This implementation is complemented with experimental results obtained in a realistic scenario, in which we compare our proposal withmultiple configurations of the Cassandra datastore (without causal consistency guarantees) and with other existing alternatives. The results show that our proposed solution is able to achieve a balanced performance, with low data visibility delays and without significant performance penalties

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    Consistency in the Cloud:When Money Does Matter!

    Get PDF
    With the emergence of cloud computing, many organizations have moved their data to the cloud in order to provide scalable, reliable and highly available services. To meet ever growing user needs, these services mainly rely on geographically-distributed data replication to guarantee good performance and high availability. However, with replication, consistency comes into question. Service providers in the cloud have the freedom to select the level of consistency according to the access patterns exhibited by the applications.Most optimizations efforts then concentrate on how to provide adequate trade-offs between between consistency guarantees and performance. However, as the monetary cost completely relies on the service providers, in this paper we argue that monetary cost should be taken into consideration when evaluating or selecting a consistency level in the cloud. Accordingly, we define a new metric called consistency-cost efficiency. Based on this metric, we present a simple, yet efficient economical consistency model, called Bismar, that adaptively tunes the consistency level at run-time in order to reduce the monetary cost while simultaneously maintaining a low fraction of stale reads. Experimental evaluations with the Cassandra cloud storage on a Grid'5000 testbed show the validity of the metric and demonstrate the effectiveness of the proposed consistency model
    • 

    corecore