26,044 research outputs found
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
TensorFlow has been the most widely adopted Machine/Deep Learning framework.
However, little exists in the literature that provides a thorough understanding
of the capabilities which TensorFlow offers for the distributed training of
large ML/DL models that need computation and communication at scale. Most
commonly used distributed training approaches for TF can be categorized as
follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand
Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu
Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this
paper, we provide an in-depth performance characterization and analysis of
these distributed training approaches on various GPU clusters including the Piz
Daint system (6 on Top500). We perform experiments to gain novel insights along
the following vectors: 1) Application-level scalability of DNN training, 2)
Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used
for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on
these experiments, we present two key insights: 1) Overall, No-gRPC designs
achieve better performance compared to gRPC-based approaches for most
configurations, and 2) The performance of No-gRPC is heavily influenced by the
gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware
MPI Allreduce design that exploits CUDA kernels and pointer caching to perform
large reductions efficiently. Our proposed designs offer 5-17X better
performance than NCCL2 for small and medium messages, and reduces latency by
29% for large messages. The proposed optimizations help Horovod-MPI to achieve
approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs.
Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native
gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint
cluster.Comment: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-revie
Knowledge Discovery in the SCADA Databases Used for the Municipal Power Supply System
This scientific paper delves into the problems related to the develop-ment of
intellectual data analysis system that could support decision making to manage
municipal power supply services. The management problems of mu-nicipal power
supply system have been specified taking into consideration modern tendencies
shown by new technologies that allow for an increase in the energy efficiency.
The analysis findings of the system problems related to the integrated
computer-aided control of the power supply for the city have been given. The
consideration was given to the hierarchy-level management decom-position model.
The objective task targeted at an increase in the energy effi-ciency to
minimize expenditures and energy losses during the generation and
transportation of energy carriers to the Consumer, the optimization of power
consumption at the prescribed level of the reliability of pipelines and
networks and the satisfaction of Consumers has been defined. To optimize the
support of the decision making a new approach to the monitoring of engineering
systems and technological processes related to the energy consumption and
transporta-tion using the technologies of geospatial analysis and Knowledge
Discovery in databases (KDD) has been proposed. The data acquisition for
analytical prob-lems is realized in the wireless heterogeneous medium, which
includes soft-touch VPN segments of ZigBee technology realizing the 6LoWPAN
standard over the IEEE 802.15.4 standard and also the segments of the networks
of cellu-lar communications. JBoss Application Server is used as a server-based
plat-form for the operation of the tools used for the retrieval of data
collected from sensor nodes, PLC and energy consumption record devices. The KDD
tools are developed using Java Enterprise Edition platform and Spring and ORM
Hiber-nate technologies
A Monitoring System for the BaBar INFN Computing Cluster
Monitoring large clusters is a challenging problem. It is necessary to
observe a large quantity of devices with a reasonably short delay between
consecutive observations. The set of monitored devices may include PCs, network
switches, tape libraries and other equipments. The monitoring activity should
not impact the performances of the system. In this paper we present PerfMC, a
monitoring system for large clusters. PerfMC is driven by an XML configuration
file, and uses the Simple Network Management Protocol (SNMP) for data
collection. SNMP is a standard protocol implemented by many networked
equipments, so the tool can be used to monitor a wide range of devices. System
administrators can display informations on the status of each device by
connecting to a WEB server embedded in PerfMC. The WEB server can produce
graphs showing the value of different monitored quantities as a function of
time; it can also produce arbitrary XML pages by applying XSL Transformations
to an internal XML representation of the cluster's status. XSL Transformations
may be used to produce HTML pages which can be displayed by ordinary WEB
browsers. PerfMC aims at being relatively easy to configure and operate, and
highly efficient. It is currently being used to monitor the Italian
Reprocessing farm for the BaBar experiment, which is made of about 200 dual-CPU
Linux machines.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 10 pages, LaTeX, 4 eps figures. PSN
MOET00
A File System Abstraction for Sense and Respond Systems
The heterogeneity and resource constraints of sense-and-respond systems pose
significant challenges to system and application development. In this paper, we
present a flexible, intuitive file system abstraction for organizing and
managing sense-and-respond systems based on the Plan 9 design principles. A key
feature of this abstraction is the ability to support multiple views of the
system via filesystem namespaces. Constructed logical views present an
application-specific representation of the network, thus enabling high-level
programming of the network. Concurrently, structural views of the network
enable resource-efficient planning and execution of tasks. We present and
motivate the design using several examples, outline research challenges and our
research plan to address them, and describe the current state of
implementation.Comment: 6 pages, 3 figures Workshop on End-to-End, Sense-and-Respond Systems,
Applications, and Services In conjunction with MobiSys '0
A Taxonomy for Management and Optimization of Multiple Resources in Edge Computing
Edge computing is promoted to meet increasing performance needs of
data-driven services using computational and storage resources close to the end
devices, at the edge of the current network. To achieve higher performance in
this new paradigm one has to consider how to combine the efficiency of resource
usage at all three layers of architecture: end devices, edge devices, and the
cloud. While cloud capacity is elastically extendable, end devices and edge
devices are to various degrees resource-constrained. Hence, an efficient
resource management is essential to make edge computing a reality. In this
work, we first present terminology and architectures to characterize current
works within the field of edge computing. Then, we review a wide range of
recent articles and categorize relevant aspects in terms of 4 perspectives:
resource type, resource management objective, resource location, and resource
use. This taxonomy and the ensuing analysis is used to identify some gaps in
the existing research. Among several research gaps, we found that research is
less prevalent on data, storage, and energy as a resource, and less extensive
towards the estimation, discovery and sharing objectives. As for resource
types, the most well-studied resources are computation and communication
resources. Our analysis shows that resource management at the edge requires a
deeper understanding of how methods applied at different levels and geared
towards different resource types interact. Specifically, the impact of mobility
and collaboration schemes requiring incentives are expected to be different in
edge architectures compared to the classic cloud solutions. Finally, we find
that fewer works are dedicated to the study of non-functional properties or to
quantifying the footprint of resource management techniques, including
edge-specific means of migrating data and services.Comment: Accepted in the Special Issue Mobile Edge Computing of the Wireless
Communications and Mobile Computing journa
Clustering Algorithms for Scale-free Networks and Applications to Cloud Resource Management
In this paper we introduce algorithms for the construction of scale-free
networks and for clustering around the nerve centers, nodes with a high
connectivity in a scale-free networks. We argue that such overlay networks
could support self-organization in a complex system like a cloud computing
infrastructure and allow the implementation of optimal resource management
policies.Comment: 14 pages, 8 Figurs, Journa
- …