26,040 research outputs found

    Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

    Full text link
    TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this paper, we provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters including the Piz Daint system (6 on Top500). We perform experiments to gain novel insights along the following vectors: 1) Application-level scalability of DNN training, 2) Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on these experiments, we present two key insights: 1) Overall, No-gRPC designs achieve better performance compared to gRPC-based approaches for most configurations, and 2) The performance of No-gRPC is heavily influenced by the gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware MPI Allreduce design that exploits CUDA kernels and pointer caching to perform large reductions efficiently. Our proposed designs offer 5-17X better performance than NCCL2 for small and medium messages, and reduces latency by 29% for large messages. The proposed optimizations help Horovod-MPI to achieve approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint cluster.Comment: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-revie

    Knowledge Discovery in the SCADA Databases Used for the Municipal Power Supply System

    Full text link
    This scientific paper delves into the problems related to the develop-ment of intellectual data analysis system that could support decision making to manage municipal power supply services. The management problems of mu-nicipal power supply system have been specified taking into consideration modern tendencies shown by new technologies that allow for an increase in the energy efficiency. The analysis findings of the system problems related to the integrated computer-aided control of the power supply for the city have been given. The consideration was given to the hierarchy-level management decom-position model. The objective task targeted at an increase in the energy effi-ciency to minimize expenditures and energy losses during the generation and transportation of energy carriers to the Consumer, the optimization of power consumption at the prescribed level of the reliability of pipelines and networks and the satisfaction of Consumers has been defined. To optimize the support of the decision making a new approach to the monitoring of engineering systems and technological processes related to the energy consumption and transporta-tion using the technologies of geospatial analysis and Knowledge Discovery in databases (KDD) has been proposed. The data acquisition for analytical prob-lems is realized in the wireless heterogeneous medium, which includes soft-touch VPN segments of ZigBee technology realizing the 6LoWPAN standard over the IEEE 802.15.4 standard and also the segments of the networks of cellu-lar communications. JBoss Application Server is used as a server-based plat-form for the operation of the tools used for the retrieval of data collected from sensor nodes, PLC and energy consumption record devices. The KDD tools are developed using Java Enterprise Edition platform and Spring and ORM Hiber-nate technologies

    A Monitoring System for the BaBar INFN Computing Cluster

    Full text link
    Monitoring large clusters is a challenging problem. It is necessary to observe a large quantity of devices with a reasonably short delay between consecutive observations. The set of monitored devices may include PCs, network switches, tape libraries and other equipments. The monitoring activity should not impact the performances of the system. In this paper we present PerfMC, a monitoring system for large clusters. PerfMC is driven by an XML configuration file, and uses the Simple Network Management Protocol (SNMP) for data collection. SNMP is a standard protocol implemented by many networked equipments, so the tool can be used to monitor a wide range of devices. System administrators can display informations on the status of each device by connecting to a WEB server embedded in PerfMC. The WEB server can produce graphs showing the value of different monitored quantities as a function of time; it can also produce arbitrary XML pages by applying XSL Transformations to an internal XML representation of the cluster's status. XSL Transformations may be used to produce HTML pages which can be displayed by ordinary WEB browsers. PerfMC aims at being relatively easy to configure and operate, and highly efficient. It is currently being used to monitor the Italian Reprocessing farm for the BaBar experiment, which is made of about 200 dual-CPU Linux machines.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 10 pages, LaTeX, 4 eps figures. PSN MOET00

    A File System Abstraction for Sense and Respond Systems

    Full text link
    The heterogeneity and resource constraints of sense-and-respond systems pose significant challenges to system and application development. In this paper, we present a flexible, intuitive file system abstraction for organizing and managing sense-and-respond systems based on the Plan 9 design principles. A key feature of this abstraction is the ability to support multiple views of the system via filesystem namespaces. Constructed logical views present an application-specific representation of the network, thus enabling high-level programming of the network. Concurrently, structural views of the network enable resource-efficient planning and execution of tasks. We present and motivate the design using several examples, outline research challenges and our research plan to address them, and describe the current state of implementation.Comment: 6 pages, 3 figures Workshop on End-to-End, Sense-and-Respond Systems, Applications, and Services In conjunction with MobiSys '0

    A Taxonomy for Management and Optimization of Multiple Resources in Edge Computing

    Full text link
    Edge computing is promoted to meet increasing performance needs of data-driven services using computational and storage resources close to the end devices, at the edge of the current network. To achieve higher performance in this new paradigm one has to consider how to combine the efficiency of resource usage at all three layers of architecture: end devices, edge devices, and the cloud. While cloud capacity is elastically extendable, end devices and edge devices are to various degrees resource-constrained. Hence, an efficient resource management is essential to make edge computing a reality. In this work, we first present terminology and architectures to characterize current works within the field of edge computing. Then, we review a wide range of recent articles and categorize relevant aspects in terms of 4 perspectives: resource type, resource management objective, resource location, and resource use. This taxonomy and the ensuing analysis is used to identify some gaps in the existing research. Among several research gaps, we found that research is less prevalent on data, storage, and energy as a resource, and less extensive towards the estimation, discovery and sharing objectives. As for resource types, the most well-studied resources are computation and communication resources. Our analysis shows that resource management at the edge requires a deeper understanding of how methods applied at different levels and geared towards different resource types interact. Specifically, the impact of mobility and collaboration schemes requiring incentives are expected to be different in edge architectures compared to the classic cloud solutions. Finally, we find that fewer works are dedicated to the study of non-functional properties or to quantifying the footprint of resource management techniques, including edge-specific means of migrating data and services.Comment: Accepted in the Special Issue Mobile Edge Computing of the Wireless Communications and Mobile Computing journa

    Clustering Algorithms for Scale-free Networks and Applications to Cloud Resource Management

    Full text link
    In this paper we introduce algorithms for the construction of scale-free networks and for clustering around the nerve centers, nodes with a high connectivity in a scale-free networks. We argue that such overlay networks could support self-organization in a complex system like a cloud computing infrastructure and allow the implementation of optimal resource management policies.Comment: 14 pages, 8 Figurs, Journa
    • …
    corecore