Search CORE

4,891 research outputs found

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

Efficient Federated Learning with Enhanced Privacy via Lottery Ticket Pruning in Edge Computing

Author: Guo Song
Li Jun
Shen Li
Shi Yifan
Wang Xueqian
Wei Kang
Yuan Bo
Publication venue
Publication date: 02/05/2023
Field of study

Federated learning (FL) is a collaborative learning paradigm for decentralized private data from mobile terminals (MTs). However, it suffers from issues in terms of communication, resource of MTs, and privacy. Existing privacy-preserving FL methods usually adopt the instance-level differential privacy (DP), which provides a rigorous privacy guarantee but with several bottlenecks: severe performance degradation, transmission overhead, and resource constraints of edge devices such as MTs. To overcome these drawbacks, we propose Fed-LTP, an efficient and privacy-enhanced FL framework with \underline{\textbf{L}}ottery \underline{\textbf{T}}icket \underline{\textbf{H}}ypothesis (LTH) and zero-concentrated D\underline{\textbf{P}} (zCDP). It generates a pruned global model on the server side and conducts sparse-to-sparse training from scratch with zCDP on the client side. On the server side, two pruning schemes are proposed: (i) the weight-based pruning (LTH) determines the pruned global model structure; (ii) the iterative pruning further shrinks the size of the pruned model's parameters. Meanwhile, the performance of Fed-LTP is also boosted via model validation based on the Laplace mechanism. On the client side, we use sparse-to-sparse training to solve the resource-constraints issue and provide tighter privacy analysis to reduce the privacy budget. We evaluate the effectiveness of Fed-LTP on several real-world datasets in both independent and identically distributed (IID) and non-IID settings. The results clearly confirm the superiority of Fed-LTP over state-of-the-art (SOTA) methods in communication, computation, and memory efficiencies while realizing a better utility-privacy trade-off.Comment: 13 page

arXiv.org e-Print Archive

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

Author: Alistarh Dan
Chilimbi Trishul
Devlin Jacob
Ho Qirong
Hoefler T.
Hoefler T.
Hoefler T.
Hsieh Kevin
Interface Forum Message Passing
Jayarajan Anand
Lian Xiangru
Recht B.
Seide Frank
Strom Nikko
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at every training step. In this paper, we propose eager-SGD, which relaxes the global synchronization for decentralized accumulation. To implement eager-SGD, we propose to use two partial collectives: solo and majority. With solo allreduce, the faster processes contribute their gradients eagerly without waiting for the slower processes, whereas with majority allreduce, at least half of the participants must contribute gradients before continuing, all without using a central parameter server. We theoretically prove the convergence of the algorithms and describe the partial collectives in detail. Experimental results on load-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) show that eager-SGD achieves 1.27x speedup over the state-of-the-art synchronous SGD, without losing accuracy.Comment: Published in Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'20), pp. 45-61. 202

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Disaster warning system: Satellite feasibility and comparison with terrestrial systems. Volume 2: Final report

Author: Bamford T. F.
Fluk M. J.
Hodge W. H.
Spoor J. H.
Publication venue
Publication date
Field of study

For abstract, see Vol. 1

NASA Technical Reports Server

Building a green connected future: smart (Internet of) Things for smart networks

Author: Koutsandria Georgia
Publication venue
Publication date: 05/10/2018
Field of study

The vision of Internet of Things (IoT) promises to reshape society by creating a future where we will be surrounded by a smart environment that is constantly aware of the users and has the ability to adapt to any changes. In the IoT, a huge variety of smart devices is interconnected to form a network of distributed agents that continuously share and process information. This communication paradigm has been recognized as one of the key enablers of the rapidly emerging applications that make up the fabric of the IoT. These networks, often called wireless sensor networks (WSNs), are characterized by the low cost of their components, their pervasive connectivity, and their self-organization features, which allow them to cooperate with other IoT elements to create large-scale heterogeneous information systems. However, a number of considerable challenges is arising when considering the design of large-scale WSNs. In particular, these networks are made up by embedded devices that suffer from severe power constraints and limited resources. The advent of low-power sensor nodes coupled with intelligent software and hardware technologies has led to the era of green wireless networks. From the hardware perspective, green sensor nodes are endowed with energy scavenging capabilities to overcome energy-related limitations. They are also endowed with low-power triggering techniques, i.e., wake-up radios, to eliminate idle listening-induced communication costs. Green wireless networks are considered a fundamental vehicle for enabling all those critical IoT applications where devices, for different reasons, do not carry batteries, and that therefore only harvest energy and store it for future use. These networks are considered to have the potential of infinite lifetime since they do not depend on batteries, or on any other limited power sources. Wake-up radios, coupled with energy provisioning techniques, further assist on overcoming the physical constraints of traditional WSNs. In addition, they are particularly important in green WSNs scenarios in which it is difficult to achieve energy neutrality due to limited harvesting rates. In this PhD thesis we set to investigate how different data forwarding mechanisms can make the most of these green wireless networks-enabling technologies, namely, energy harvesting and wake-up radios. Specifically, we present a number of cross-layer routing approaches with different forwarding design choices and study their consequences on network performance. Among the most promising protocol design techniques, the past decade has shown the increasingly intensive adoption of techniques based on various forms of machine learning to increase and optimize the performance of WSNs. However, learning techniques can suffer from high computational costs as nodes drain a considerable percentage of their energy budget to run sophisticated software procedures, predict accurate information and determine optimal decision. This thesis addresses also the problem of local computational requirements of learning-based data forwarding strategies by investigating their impact on the performance of the network. Results indicate that local computation can be a major source of energy consumption; it’s impact on network performance should not be neglected

Archivio della ricerca- Università di Roma La Sapienza

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Author: Ben-Nun Tal
Hoefler Torsten
Publication venue
Publication date: 15/09/2018
Field of study

Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning

arXiv.org e-Print Archive

Repository for Publications and Research Data

Proceedings of Abstracts Engineering and Computer Science Research Conference 2019

Author: Adams Roderick
Amafabia Daerefa-a
Barker Trevor
Beka Nathan
Bhavsar Ronakben
Bonivart Agnes
Canoville Paul
Cañamero Lola
CHEN Yong Kang
CHEN Yong Kang
Chrysanthou Andreas
Counsell Nathan
Crook Brian
Davey Neil
David-West Opukuro
Denai Mouloud
Dhakal Hom
Drix Damien
Goncharenko Julia
Grasso Marzio
Hafner Verena Vanessa
Hall Samantha
Haritos George
Hassan Eheda
Helian Na
Herfatmanesh Mohammad Reza
Ismail Sikiru O.
Johnston Ian
Johnston Ian
Kadir Shabnam
Kaye Richard
Khan Imran
Kirner Raimund
Kirner Raimund
Klaholz Ingo
Klusak Jan
Kourtessis Pandelis
Lane Peter
Lekkala Himayasri Rao
Lilley Mariana
Mayor David
McCluskey Daniel
McCluskey Daniel
Menon Catherine
Metzner Christoph
Miko Rebecca
Montalvão Diogo
Mporas Iosif
Munro Ian
Nehaniv Chrystopher
Newman James
Nwawe Richard
Panday Deepak
Partou Helen
Pissanidis Georgios
Polani Daniel
Ren Guogang
Robinson Matthew
Rosiello Vincenzo
Sayers Paul
Schilstra Maria
Schirmer Pascal
Schmuker Michael
Siadati Rana
Sinha Ankur
Skaltsas Grigorios
Steffert Tony
Steuber Volker
Steuber Volker
Suckow Bjorn
Sun Yi
Sun Yichuang
Sunmola Funlade
Sutton Samuel
te Boekhorst Rene
Toffe Gilles
Tracey Mark
Tveretina Olga
Veneziano Vito
Verma Alok
Wang Yuan
Wernick Paul
Publication venue: University of Hertfordshire
Publication date: 01/09/2019
Field of study

© 2019 The Author(s). This is an open-access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. For further details please see https://creativecommons.org/licenses/by/4.0/. Note: Keynote: Fluorescence visualisation to evaluate effectiveness of personal protective equipment for infection control is © 2019 Crown copyright and so is licensed under the Open Government Licence v3.0. Under this licence users are permitted to copy, publish, distribute and transmit the Information; adapt the Information; exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application. Where you do any of the above you must acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/This book is the record of abstracts submitted and accepted for presentation at the Inaugural Engineering and Computer Science Research Conference held 17th April 2019 at the University of Hertfordshire, Hatfield, UK. This conference is a local event aiming at bringing together the research students, staff and eminent external guests to celebrate Engineering and Computer Science Research at the University of Hertfordshire. The ECS Research Conference aims to showcase the broad landscape of research taking place in the School of Engineering and Computer Science. The 2019 conference was articulated around three topical cross-disciplinary themes: Make and Preserve the Future; Connect the People and Cities; and Protect and Care

University of Hertfordshire Research Archive