513 research outputs found

    Reducing the Tail Latency of a Distributed NoSQL Database

    Get PDF
    The request latency is an important performance metric of a distributed database, such as the popular Apache Cassandra, because of its direct impact on the user experience. Specifically, the latency of a read or write request is defined as the total time interval from the instant when a user makes the request to the instant when the user receives the request, and it involves not only the actual read or write time at a specific database node, but also various types of latency introduced by the distributed mechanism of the database. Most of the current work focuses only on reducing the average request latency, but not on reducing the tail request latency that has a significant and severe impact on some of database users. In this thesis, we investigate the important factors on the tail request latency of Apache Cassandra, then propose two novel methods to greatly reduce the tail request latency. First, we find that the background activities may considerably increase the local latency of a replica and then the overall request latency of the whole database, and thus we propose a novel method to select the optimal replica by considering the impact of background activities. Second, we find that the asynchronous read and write architecture handles the local and remote requests in the same way, which is simple to implement but at a cost of possibly longer latency, and thus we propose a synchronous method to handle local and remote request differently to greatly reduce the latency. Finally, our experiments on Amazon EC2 public cloud platform demonstrate that our proposed methods can greatly reduce the tail latency of read and write requests of Apache Cassandra. Adviser: Dr. Lisong X

    Split-Merge Model of Workunit Replication in Distributed Computing

    Get PDF

    Dynamic Parameter Allocation in Parameter Servers

    Full text link
    To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in distributed training---, but can induce severe communication overhead. To reduce communication overhead, distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, however, and therefore prevent efficient training. In this paper, we explore whether and to what extent PAL techniques can be supported, and whether such support is beneficial. We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks. We found that Lapse provides near-linear scaling and can be orders of magnitude faster than existing parameter servers

    Efficient Redundancy Techniques in Cloud and Desktop Grid Systems using MAP/G/c-type Queues

    Get PDF
    Cloud computing is continuing to prove its flexibility and versatility in helping industries and businesses as well as academia as a way of providing needed computing capacity. As an important alternative to cloud computing, desktop grids allow to utilize the idle computer resources of an enterprise/community by means of distributed computing system, providing a more secure and controllable environment with lower operational expenses. Further, both cloud computing and desktop grids are meant to optimize limited resources and at the same time to decrease the expected latency for users. The crucial parameter for optimization both in cloud computing and in desktop grids is the level of redundancy (replication) for service requests/workunits. In this paper we study the optimal replication policies by considering three variations of Fork-Join systems in the context of a multi-server queueing system with a versatile point process for the arrivals. For services we consider phase type distributions as well as shifted exponential and Weibull. We use both analytical and simulation approach in our analysis and report some interesting qualitative results

    A Framework of Fog Computing: Architecture, Challenges and Optimization

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record.Fog Computing (FC) is an emerging distributed computing platform aimed at bringing computation close to its data sources, which can reduce the latency and cost of delivering data to a remote cloud. This feature and related advantages are desirable for many Internet-of-Things applications, especially latency sensitive and mission intensive services. With comparisons to other computing technologies, the definition and architecture of FC are presented in this article. The framework of resource allocation for latency reduction combined with reliability, fault tolerance, privacy, and underlying optimization problems are also discussed. We then investigate an application scenario and conduct resource optimization by formulating the optimization problem and solving it via a Genetic Algorithm. The resulting analysis generates some important insights on the scalability of FC systems.This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/P020224/1] and the EU FP7 QUICK project under Grant Agreement No. PIRSES-GA-2013-612652. Yang Liu was supported by the Chinese Research Council
    • …
    corecore