14 research outputs found
Networked Federated Learning
We develop the theory and algorithmic toolbox for networked federated
learning in decentralized collections of local datasets with an intrinsic
network structure. This network structure arises from domain-specific notions
of similarity between local datasets. Different notions of similarity are
induced by spatio-temporal proximity, statistical dependencies or functional
relations. Our main conceptual contribution is to formulate networked federated
learning using a generalized total variation minimization. This formulation
unifies and considerably extends existing federated multi-task learning
methods. It is highly flexible and can be combined with a broad range of
parametric models including Lasso or deep neural networks. Our main algorithmic
contribution is a novel networked federated learning algorithm which is well
suited for distributed computing environments such as edge computing over
wireless networks. This algorithm is robust against inexact computations
arising from limited computational resources including processing time or
bandwidth. For local models resulting in convex problems, we derive precise
conditions on the local models and their network structure such that our
algorithm learns nearly optimal local models. Our analysis reveals an
interesting interplay between the convex geometry of local models and the
(cluster-) geometry of their network structure
Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration
Future AI applications require performance, reliability and privacy that the
existing, cloud-dependant system architectures cannot provide. In this article,
we study orchestration in the device-edge-cloud continuum, and focus on AI for
edge, that is, the AI methods used in resource orchestration. We claim that to
support the constantly growing requirements of intelligent applications in the
device-edge-cloud computing continuum, resource orchestration needs to embrace
edge AI and emphasize local autonomy and intelligence. To justify the claim, we
provide a general definition for continuum orchestration, and look at how
current and emerging orchestration paradigms are suitable for the computing
continuum. We describe certain major emerging research themes that may affect
future orchestration, and provide an early vision of an orchestration paradigm
that embraces those research themes. Finally, we survey current key edge AI
methods and look at how they may contribute into fulfilling the vision of
future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures
and new section
Federated Learning of Artificial Neural Networks
A jelenlegi, legszélesebb körben alkalmazható gépi tanulás (ML) modellek, és különösképp mesterséges neurális hálók betanítása rendkívül nagy mennyiségű adatot és jelentős számítási kapacitást igényel. A Federált Tanulás (FL) kutatás fókuszában az ML modellek kollaboratív tanítása áll, napjaink heterogén, földrajzilag is erősen elosztott információs infrastruktúráján. Az FL célja ezáltal eloszlatni a tanulás számítási igényét a résztvevők (node-ok) között, az adatot annak keletkezési helyén feldolgozva, míg tanulás maga a node-okon számított módosítási igények (update-ek) időszakonkénti begyűjtésével, összegzésével és a frissített modell szétosztásával történik.
Az FL-lel kapcsolatos kutatások, a mi megátsunk szerint három főbb irányba folynak: (1) Az első irány az általánosan elfogadott federált tanulási metódus, a Federált Átlagolás (FedAvg) életszerű környezetben való alkalmazásának kérdéseivel foglalkozik, azaz hogyan lehetséges a szükséges kommunikációs és számítási kapacitás biztosítása. (2) A második irány a FedAvg algoritmus alkalmazásakor fellépő problémákra fókuszál, úgymint a modell csökkenő általános pontosága, valamint a közös modell potenciálisan elégtelen teljesítménye a végfelhasználóknál. (3) A harmadik sokat kutatott téma pedig a résztvevők bizalmas adatinak minél erősebb védelmének módjait vizsgálja.
A disszertációban az mesterséges neurális hálók federált tanításának az ezen, általunk a legfontosabbnak ítélt irányokban történő fejlesztésére irányuló munkánkat mutatom be. Az bemutatott metódusok az egyes problémák ehnyhítésére a következő ötleteken alapulnak: (1) A FedAvg algoritmus peer-to-peer átalakítása (2) a múltbeli állapotokon alapuló optimalizációs metódusok alkalmazása; valamint (3) a gradiensek használatát nem igénylő természet által inspirált optimalizációs módszerek alkalmazása
Improving the performance of dataflow systems for deep neural network training
Deep neural networks (DNNs) have led to significant advancements in machine learning.
With deep structure and flexible model parameterisation, they exhibit state-of-the-art accuracies for many complex tasks e.g. image recognition. To achieve this, models are trained iteratively over large datasets. This process involves expensive matrix operations, making it time-consuming to obtain converged models. To accelerate training, dataflow systems parallelise computation. A scalable approach is to use parameter server framework: it has workers that train model replicas in parallel and parameter servers that synchronise the replicas to ensure the convergence.
With distributed DNN systems, there are three challenges that determine the training completion time. In this thesis, we propose practical and effective techniques to address each of these challenges.
Since frequent model synchronisation results in high network utilisation, the parameter server approach can suffer from network bottlenecks, thus requiring decisions on resource allocation. Our idea is to use all available network bandwidth and synchronise subject to the available bandwidth. We present Ako, a DNN system that uses partial gradient exchange for synchronising replicas in a peer-to-peer fashion. We show that our technique exhibits a 25% lower convergence time than a hand-tuned parameter-server deployments.
For a long training, the compute efficiency of worker nodes is important. We argue that processing hardware should be fully utilised for the best speed-up. The key observation is it is possible to overlap the execution of several matrix operations with other workloads. We describe Crossbow, a GPU-based system that maximises hardware utilisation. By using a multi-streaming scheduler, multiple models are trained in parallel on GPU and achieve a 2.3x speed-up compared to a state-of-the-art system.
The choice of model configuration for replicas also directly determines convergence quality. Dataflow systems are used for exploring the promising configurations but provide little support for efficient exploratory workflows. We present Meta-dataflow (MDF), a dataflow model that expresses complex workflows. By taking into account all configurations as a unified workflow, MDFs efficiently reduce time spent on configuration exploration.Open Acces
Recommended from our members
Operating system support for warehouse-scale computing
Modern applications are increasingly backed by large-scale data centres. Systems software in these data centre environments, however, faces substantial challenges: the lack of uniform resource abstractions makes sharing and resource management inefficient, infrastructure software lacks end-to-end access control mechanisms, and work placement ignores the effects of hardware heterogeneity and workload interference.
In this dissertation, I argue that uniform, clean-slate operating system (OS) abstractions designed to support distributed systems can make data centres more efficient and secure. I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources.
First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. Translucent abstractions free users from having to understand implementation details, but enable introspection for performance optimisation. Fine-grained access control is supported by combining
storable, communicable identifier capabilities, and context-dependent, ephemeral handle capabilities. Finally, multi-phase I/O requests implement optimistically concurrent access to objects
while supporting diverse application-level consistency policies.
Second, I present the DIOS operating system, an implementation of my model as an extension to Linux. The DIOS system call API is centred around distributed objects, globally resolvable names, and translucent references that carry context-sensitive object meta-data. I illustrate how these concepts support distributed applications, and evaluate the performance of DIOS in microbenchmarks and a data-intensive MapReduce application. I find that it offers improved, finegrained isolation of resources, while permitting flexible sharing.
Third, I present the Firmament cluster scheduler, which generalises prior work on scheduling via minimum-cost flow optimisation. Firmament can flexibly express many scheduling policies using pluggable cost models; it makes high-quality placement decisions based on fine-grained information about tasks and resources; and it scales the flow-based scheduling approach to very large clusters. In two case studies, I show that Firmament supports policies that reduce colocation interference between tasks and that it successfully exploits flexibility in the workload to improve the energy efficiency of a heterogeneous cluster. Moreover, my evaluation shows that Firmament scales the minimum-cost flow optimisation to clusters of tens of thousands of machines while still making sub-second placement decisions.St John's College Supplementary Emolument Fund
DARP