5,150 research outputs found

    Survey and Analysis of Production Distributed Computing Infrastructures

    Full text link
    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

    Survey of End-to-End Mobile Network Measurement Testbeds, Tools, and Services

    Full text link
    Mobile (cellular) networks enable innovation, but can also stifle it and lead to user frustration when network performance falls below expectations. As mobile networks become the predominant method of Internet access, developer, research, network operator, and regulatory communities have taken an increased interest in measuring end-to-end mobile network performance to, among other goals, minimize negative impact on application responsiveness. In this survey we examine current approaches to end-to-end mobile network performance measurement, diagnosis, and application prototyping. We compare available tools and their shortcomings with respect to the needs of researchers, developers, regulators, and the public. We intend for this survey to provide a comprehensive view of currently active efforts and some auspicious directions for future work in mobile network measurement and mobile application performance evaluation.Comment: Submitted to IEEE Communications Surveys and Tutorials. arXiv does not format the URL references correctly. For a correctly formatted version of this paper go to http://www.cs.montana.edu/mwittie/publications/Goel14Survey.pd

    Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

    Full text link
    TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this paper, we provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters including the Piz Daint system (6 on Top500). We perform experiments to gain novel insights along the following vectors: 1) Application-level scalability of DNN training, 2) Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on these experiments, we present two key insights: 1) Overall, No-gRPC designs achieve better performance compared to gRPC-based approaches for most configurations, and 2) The performance of No-gRPC is heavily influenced by the gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware MPI Allreduce design that exploits CUDA kernels and pointer caching to perform large reductions efficiently. Our proposed designs offer 5-17X better performance than NCCL2 for small and medium messages, and reduces latency by 29% for large messages. The proposed optimizations help Horovod-MPI to achieve approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint cluster.Comment: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-revie

    Crux: Locality-Preserving Distributed Services

    Full text link
    Distributed systems achieve scalability by distributing load across many machines, but wide-area deployments can introduce worst-case response latencies proportional to the network's diameter. Crux is a general framework to build locality-preserving distributed systems, by transforming an existing scalable distributed algorithm A into a new locality-preserving algorithm ALP, which guarantees for any two clients u and v interacting via ALP that their interactions exhibit worst-case response latencies proportional to the network latency between u and v. Crux builds on compact-routing theory, but generalizes these techniques beyond routing applications. Crux provides weak and strong consistency flavors, and shows latency improvements for localized interactions in both cases, specifically up to several orders of magnitude for weakly-consistent Crux (from roughly 900ms to 1ms). We deployed on PlanetLab locality-preserving versions of a Memcached distributed cache, a Bamboo distributed hash table, and a Redis publish/subscribe. Our results indicate that Crux is effective and applicable to a variety of existing distributed algorithms.Comment: 11 figure

    Reducing Internet Latency : A Survey of Techniques and their Merit

    Get PDF
    Bob Briscoe, Anna Brunstrom, Andreas Petlund, David Hayes, David Ros, Ing-Jyh Tsang, Stein Gjessing, Gorry Fairhurst, Carsten Griwodz, Michael WelzlPeer reviewedPreprin

    A Flexible and Scalable Architecture for Real-Time ANT+ Sensor Data Acquisition and NoSQL Storage

    Get PDF
    Wireless Personal or Body Area Networks (WPANs or WBANs) are the main mechanisms to develop healthcare systems for an ageing society. Such systems offer monitoring, security, and caring services by measuring physiological body parameters using wearable devices. Wireless sensor networks allow inexpensive, continuous, and real-time updates of the sensor data, to the data repositories via an Internet. A great deal of research is going on with a focus on technical, managerial, economic, and social health issues. The technical obstacles, which we encounter, in general, are better methodologies, architectures, and context data storage. Sensor communication, data processing and interpretation, data interchange format, data transferal, and context data storage are sensitive phases during the whole process of body parameter acquisition until the storage. ANT+ is a proprietary (but open access) low energy protocol, which supports device interoperability by mutually agreeing upon device profile standards. We have implemented a prototype, based upon ANT+ enabled sensors for a real-time scenario. This paper presents a system architecture, with its software organization, for real-time message interpretation, event-driven based real-time bidirectional communication, and schema flexible storage. A computer user uses it to acquire and to transmit the data using a Windows service to the context server
    corecore