3,584 research outputs found
Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification
Federated learning (FL) has been recognized as a rapidly growing research
area, where the model is trained over massively distributed clients under the
orchestration of a parameter server (PS) without sharing clients' data. This
paper delves into a class of federated problems characterized by non-convex and
non-smooth loss functions, that are prevalent in FL applications but
challenging to handle due to their intricate non-convexity and non-smoothness
nature and the conflicting requirements on communication efficiency and privacy
protection. In this paper, we propose a novel federated primal-dual algorithm
with bidirectional model sparsification tailored for non-convex and non-smooth
FL problems, and differential privacy is applied for strong privacy guarantee.
Its unique insightful properties and some privacy and convergence analyses are
also presented for the FL algorithm design guidelines. Extensive experiments on
real-world data are conducted to demonstrate the effectiveness of the proposed
algorithm and much superior performance than some state-of-the-art FL
algorithms, together with the validation of all the analytical results and
properties.Comment: 30 pages, 8 figure
A Survey on Deep Semi-supervised Learning
Deep semi-supervised learning is a fast-growing field with a range of
practical applications. This paper provides a comprehensive survey on both
fundamentals and recent advances in deep semi-supervised learning methods from
model design perspectives and unsupervised loss functions. We first present a
taxonomy for deep semi-supervised learning that categorizes existing methods,
including deep generative methods, consistency regularization methods,
graph-based methods, pseudo-labeling methods, and hybrid methods. Then we offer
a detailed comparison of these methods in terms of the type of losses,
contributions, and architecture differences. In addition to the past few years'
progress, we further discuss some shortcomings of existing methods and provide
some tentative heuristic solutions for solving these open problems.Comment: 24 pages, 6 figure
Wireless for Machine Learning
As data generation increasingly takes place on devices without a wired
connection, Machine Learning over wireless networks becomes critical. Many
studies have shown that traditional wireless protocols are highly inefficient
or unsustainable to support Distributed Machine Learning. This is creating the
need for new wireless communication methods. In this survey, we give an
exhaustive review of the state of the art wireless methods that are
specifically designed to support Machine Learning services. Namely,
over-the-air computation and radio resource allocation optimized for Machine
Learning. In the over-the-air approach, multiple devices communicate
simultaneously over the same time slot and frequency band to exploit the
superposition property of wireless channels for gradient averaging
over-the-air. In radio resource allocation optimized for Machine Learning,
Active Learning metrics allow for data evaluation to greatly optimize the
assignment of radio resources. This paper gives a comprehensive introduction to
these methods, reviews the most important works, and highlights crucial open
problems.Comment: Corrected typo in author name. From the incorrect Maitron to the
correct Mairto
Communication-efficient scheduling policy for federated learning under channel uncertainty
Abstract. Federated learning (FL) is a promising decentralized training method for on-device machine learning. Yet achieving a performance close to a centralized training via FL is hindered by the client-server communication. In this work, a novel joint client scheduling and resource block (RB) allocation policy is proposed to minimize the loss of accuracy in FL over a wireless system with imperfect channel state information (CSI) compared to a centralized training-based solution. First, the accuracy loss minimization problem is cast as a stochastic optimization problem over a predefined training duration. In order to learn and track the wireless channel under imperfect CSI, a Gaussian process regression (GPR)-based channel prediction method is leveraged and incorporated into the scheduling decision. Next, the client scheduling and RB allocation policy is derived by solving the aforementioned stochastic optimization problem using the Lyapunov optimization framework. Then, the aforementioned solution is extended for scenarios with perfect CSI. Finally, the proposed scheduling policies for both perfect and imperfect CSI are evaluated via numerical simulations. Results show that the proposed method reduces the accuracy loss up to 25.8% compared to FL client scheduling and RB allocation policies in the existing literature
An Overview of Deep Semi-Supervised Learning
Deep neural networks demonstrated their ability to provide remarkable
performances on a wide range of supervised learning tasks (e.g., image
classification) when trained on extensive collections of labeled data (e.g.,
ImageNet). However, creating such large datasets requires a considerable amount
of resources, time, and effort. Such resources may not be available in many
practical cases, limiting the adoption and the application of many deep
learning methods. In a search for more data-efficient deep learning methods to
overcome the need for large annotated datasets, there is a rising research
interest in semi-supervised learning and its applications to deep neural
networks to reduce the amount of labeled data required, by either developing
novel methods or adopting existing semi-supervised learning frameworks for a
deep learning setting. In this paper, we provide a comprehensive overview of
deep semi-supervised learning, starting with an introduction to the field,
followed by a summarization of the dominant semi-supervised approaches in deep
learning.Comment: Preprin
Black-Box Parallelization for Machine Learning
The landscape of machine learning applications is changing rapidly: large centralized datasets are replaced by high volume, high velocity data streams generated by a vast number of geographically distributed, loosely connected devices, such as mobile phones, smart sensors, autonomous vehicles or industrial machines. Current learning approaches centralize the data and process it in parallel in a cluster or computing center. This has three major disadvantages: (i) it does not scale well with the number of data-generating devices since their growth exceeds that of computing centers, (ii) the communication costs for centralizing the data are prohibitive in many applications, and (iii) it requires sharing potentially privacy-sensitive data. Pushing computation towards the data-generating devices alleviates these problems and allows to employ their otherwise unused computing power. However, current parallel learning approaches are designed for tightly integrated systems with low latency and high bandwidth, not for loosely connected distributed devices. Therefore, I propose a new paradigm for parallelization that treats the learning algorithm as a black box, training local models on distributed devices and aggregating them into a single strong one. Since this requires only exchanging models instead of actual data, the approach is highly scalable, communication-efficient, and privacy-preserving. Following this paradigm, this thesis develops black-box parallelizations for two broad classes of learning algorithms. One approach can be applied to incremental learning algorithms, i.e., those that improve a model in iterations. Based on the utility of aggregations it schedules communication dynamically, adapting it to the hardness of the learning problem. In practice, this leads to a reduction in communication by orders of magnitude. It is analyzed for (i) online learning, in particular in the context of in-stream learning, which allows to guarantee optimal regret and for (ii) batch learning based on empirical risk minimization where optimal convergence can be guaranteed. The other approach is applicable to non-incremental algorithms as well. It uses a novel aggregation method based on the Radon point that allows to achieve provably high model quality with only a single aggregation. This is achieved in polylogarithmic runtime on quasi-polynomially many processors. This relates parallel machine learning to Nick's class of parallel decision problems and is a step towards answering a fundamental open problem about the abilities and limitations of efficient parallel learning algorithms. An empirical study on real distributed systems confirms the potential of the approaches in realistic application scenarios
Federated Machine Learning in Edge Computing
Machine Learning (ML) is transforming the way that computers are used to solve problems in computer vision, natural language processing, scientific modelling, and much more. The rising number of devices connected to the Internet generate huge quantities of data that can be used for ML purposes.
Traditionally, organisations require user data to be uploaded to a single location (i.e., cloud datacentre) for centralised ML. However, public concerns regarding data-privacy are growing, and in some domains such as healthcare, there exist strict laws governing the access of data. The computational power and connectivity of devices at the network edge is also increasing: edge computing is a paradigm designed to move computation from the cloud to the edge to reduce latency and traffic.
Federated Learning (FL) is a new and swiftly-developing field that has huge potential for privacy-preserving ML. In FL, edge devices collaboratively train a model without users sharing their personal data with any other party. However, there exist multiple challenges for designing useful FL algorithms, including: the heterogeneity of data across participating clients; the low computing power, intermittent connectivity and unreliability of clients at the network edge compared to the datacentre; and the difficulty of limiting information leakage whilst still training high-performance models.
This thesis proposes new methods for improving the process of FL in edge computing and hence making it more practical for real-world deployments. First, a novel approach is designed that accelerates the convergence of the FL model through adaptive optimisation, reducing the time taken to train a model, whilst lowering the total quantity of information uploaded from edge clients to the coordinating server through two new compression strategies. Next, a Multi-Task FL framework is proposed that allows participating clients to train unique models that are tailored to their own heterogeneous datasets whilst still benefiting from FL, improving model convergence speed and generalisation performance across clients. Then, the principle of decreasing the total work that clients perform during the FL process is explored. A theoretical analysis (and subsequent experimental evaluation) suggests that this approach can reduce the time taken to reach a desired training error whilst lowering the total computational cost of FL and improving communication-efficiency. Lastly, an algorithm is designed that applies adaptive optimisation to FL in a novel way, through the use of a statistically-biased optimiser whose values are kept fixed on clients. This algorithm can leverage the convergence guarantees of centralised algorithms, with the addition of FL-related error-terms. Furthermore, it shows excellent performance on benchmark FL datasets whilst possessing lower computation and upload costs compared to competing adaptive-FL algorithms
New formulations for active learning
In this thesis, we provide computationally efficient algorithms with provable statistical guarantees, for the problem of active learning, by using ideas from sequential analysis. We provide a generic algorithmic framework for active learning in the pool setting, and instantiate this framework by using ideas from learning with experts, stochastic optimization, and multi-armed bandits. For the problem of learning convex combination of a given set of hypothesis, we provide a stochastic mirror descent based active learning algorithm in the stream setting.Ph.D
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants of
Tiramisu and DeepLabv3+ neural networks. We describe improvements to the
software frameworks, input pipeline, and the network training algorithms
necessary to efficiently scale deep learning on the Piz Daint and Summit
systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained
throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up
to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel
efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor
Cores, a half-precision version of the DeepLabv3+ network achieves a peak and
sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November
11-16, 2018, Dallas, TX, US
- …