4,891 research outputs found
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing
Federated learning (FL) is a collaborative learning paradigm for
decentralized private data from mobile terminals (MTs). However, it suffers
from issues in terms of communication, resource of MTs, and privacy. Existing
privacy-preserving FL methods usually adopt the instance-level differential
privacy (DP), which provides a rigorous privacy guarantee but with several
bottlenecks: severe performance degradation, transmission overhead, and
resource constraints of edge devices such as MTs. To overcome these drawbacks,
we propose Fed-LTP, an efficient and privacy-enhanced FL framework with
\underline{\textbf{L}}ottery \underline{\textbf{T}}icket
\underline{\textbf{H}}ypothesis (LTH) and zero-concentrated
D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the
server side and conducts sparse-to-sparse training from scratch with zCDP on
the client side. On the server side, two pruning schemes are proposed: (i) the
weight-based pruning (LTH) determines the pruned global model structure; (ii)
the iterative pruning further shrinks the size of the pruned model's
parameters. Meanwhile, the performance of Fed-LTP is also boosted via model
validation based on the Laplace mechanism. On the client side, we use
sparse-to-sparse training to solve the resource-constraints issue and provide
tighter privacy analysis to reduce the privacy budget. We evaluate the
effectiveness of Fed-LTP on several real-world datasets in both independent and
identically distributed (IID) and non-IID settings. The results clearly confirm
the superiority of Fed-LTP over state-of-the-art (SOTA) methods in
communication, computation, and memory efficiencies while realizing a better
utility-privacy trade-off.Comment: 13 page
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
Load imbalance pervasively exists in distributed deep learning training
systems, either caused by the inherent imbalance in learned tasks or by the
system itself. Traditional synchronous Stochastic Gradient Descent (SGD)
achieves good accuracy for a wide variety of tasks, but relies on global
synchronization to accumulate the gradients at every training step. In this
paper, we propose eager-SGD, which relaxes the global synchronization for
decentralized accumulation. To implement eager-SGD, we propose to use two
partial collectives: solo and majority. With solo allreduce, the faster
processes contribute their gradients eagerly without waiting for the slower
processes, whereas with majority allreduce, at least half of the participants
must contribute gradients before continuing, all without using a central
parameter server. We theoretically prove the convergence of the algorithms and
describe the partial collectives in detail. Experimental results on
load-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) show
that eager-SGD achieves 1.27x speedup over the state-of-the-art synchronous
SGD, without losing accuracy.Comment: Published in Proceedings of the 25th ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming (PPoPP'20), pp. 45-61. 202
Disaster warning system: Satellite feasibility and comparison with terrestrial systems. Volume 2: Final report
For abstract, see Vol. 1
Building a green connected future: smart (Internet of) Things for smart networks
The vision of Internet of Things (IoT) promises to reshape society by creating a future where we will be surrounded by a smart environment that is constantly aware of the users and has the ability to adapt to any changes. In the IoT, a huge variety of smart devices is interconnected to form a network of distributed agents that continuously share and process information. This communication paradigm has been recognized as one of the key enablers of the rapidly emerging applications that make up the fabric of the IoT. These networks, often called wireless sensor networks (WSNs), are characterized by the low cost of their components, their pervasive connectivity, and their self-organization features, which allow them to cooperate with other IoT elements to create large-scale heterogeneous information systems. However, a number of considerable challenges is arising when considering the design of large-scale WSNs. In particular, these networks are made up by embedded devices that suffer from severe power constraints and limited resources. The advent of low-power sensor nodes coupled with intelligent software and hardware technologies has led to the era of green wireless networks. From the hardware
perspective, green sensor nodes are endowed with energy scavenging capabilities to overcome energy-related limitations. They are also endowed with low-power
triggering techniques, i.e., wake-up radios, to eliminate idle listening-induced communication costs. Green wireless networks are considered a fundamental vehicle
for enabling all those critical IoT applications where devices, for different reasons, do not carry batteries, and that therefore only harvest energy and store it for future
use. These networks are considered to have the potential of infinite lifetime since they do not depend on batteries, or on any other limited power sources. Wake-up radios, coupled with energy provisioning techniques, further assist on overcoming the physical constraints of traditional WSNs. In addition, they are particularly important
in green WSNs scenarios in which it is difficult to achieve energy neutrality due to limited harvesting rates. In this PhD thesis we set to investigate how different data forwarding mechanisms can make the most of these green wireless networks-enabling technologies, namely, energy harvesting and wake-up radios. Specifically, we present a number of cross-layer routing approaches with different forwarding design choices and study their
consequences on network performance. Among the most promising protocol design techniques, the past decade has shown the increasingly intensive adoption of techniques based on various forms of machine learning to increase and optimize the performance of WSNs. However, learning techniques can suffer from high computational costs as nodes drain a considerable percentage of their energy budget to run sophisticated software procedures, predict accurate information and determine optimal decision. This thesis addresses also the problem of local computational requirements of learning-based data forwarding strategies by investigating their impact on the performance of the network. Results indicate that local computation can be a major source of energy consumption; it’s impact on network performance should not be neglected
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern
computing applications. Accelerating their training is a major challenge and
techniques range from distributed algorithms to low-level circuit design. In
this survey, we describe the problem from a theoretical perspective, followed
by approaches for its parallelization. We present trends in DNN architectures
and the resulting implications on parallelization strategies. We then review
and model the different types of concurrency in DNNs: from the single operator,
through parallelism in network inference and training, to distributed deep
learning. We discuss asynchronous stochastic optimization, distributed system
architectures, communication schemes, and neural architecture search. Based on
those approaches, we extrapolate potential directions for parallelism in deep
learning
Proceedings of Abstracts Engineering and Computer Science Research Conference 2019
© 2019 The Author(s). This is an open-access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. For further details please see https://creativecommons.org/licenses/by/4.0/. Note: Keynote: Fluorescence visualisation to evaluate effectiveness of personal protective equipment for infection control is © 2019 Crown copyright and so is licensed under the Open Government Licence v3.0. Under this licence users are permitted to copy, publish, distribute and transmit the Information; adapt the Information; exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application. Where you do any of the above you must acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/This book is the record of abstracts submitted and accepted for presentation at the Inaugural Engineering and Computer Science Research Conference held 17th April 2019 at the University of Hertfordshire, Hatfield, UK. This conference is a local event aiming at bringing together the research students, staff and eminent external guests to celebrate Engineering and Computer Science Research at the University of Hertfordshire. The ECS Research Conference aims to showcase the broad landscape of research taking place in the School of Engineering and Computer Science. The 2019 conference was articulated around three topical cross-disciplinary themes: Make and Preserve the Future; Connect the People and Cities; and Protect and Care
- …