Search CORE

5,490 research outputs found

Recommended from our members

DISTRIBUTED LEARNING ALGORITHMS: COMMUNICATION EFFICIENCY AND ERROR RESILIENCE

Author: Maity Raj Kumar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2022
Field of study

In modern day machine learning applications such as self-driving cars, recommender systems, robotics, genetics etc., the size of the training data has grown to the point that it has become essential to design distributed learning algorithms. A general framework for the distributed learning is \emph{data parallelism} where the data is distributed among the \emph{worker machines} for parallel processing and computation to speed up learning. With billions of devices such as cellphones, computers etc., the data is inherently distributed and stored locally in the users\u27 devices. Learning in this set up is popularly known as \emph{Federated Learning}. The speed-up due to distributed framework gets hindered by some fundamental problems such as straggler workers, communication bottleneck due to high communication overhead between workers and central server, adversarial failure popularly know as \emph{Byzantine failure}. In this thesis, we study and develop distributed algorithms that are error resilient and communication efficient. First, we address the problem of straggler workers where the learning is delayed due to slow workers in the distributed setup. To mitigate the effect of the stragglers, we employ \textbf{LDPC} (low density parity check) code to encode the data and implement gradient descent algorithm in the distributed setup. Second, we present a family of vector quantization schemes \emph{vqSGD} (vector quantized Stochastic Gradient Descent ) that provides an asymptotic reduction in the communication cost with convergence guarantees in the first order distributed optimization. We also showed that \emph{vqSGD} provides strong privacy guarantee. Third, we address the problem of Byzantine failure together with communication-efficiency in the first order gradient descent algorithm. We consider a generic class of

\delta

- approximate compressor for communication efficiency and employ a simple \emph{norm based thresholding} scheme to make the learning algorithm robust to Byzantine failures. We establish statistical error rate for non-convex smooth loss. Moreover, we analyze the compressed gradient descent algorithm with error feedback in a distributed setting and in the presence of Byzantine worker machines. Fourth, we employ the generic class of

\delta

- approximate compressor to develop a communication efficient second order Newton-type algorithm and provide rate of convergence for smooth objective. Fifth, we propose \textbf{COMRADE} (COMmunication-efficient and Robust Approximate Distributed nEwton ), an iterative second order algorithm that is communication efficient as well as robust against Byzantine failures. Sixth, we propose a distributed \emph{cubic-regularized Newton } algorithm that can escape saddle points effectively for non-convex loss function and find a local minima . Furthermore, the proposed algorithm can resist the attack of the Byzantine machines, which may create \emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack

ScholarWorks@UMass Amherst

Recommended from our members

Towards More Scalable and Robust Machine Learning

Author: Yin Dong
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

For many data-intensive real-world applications, such as recognizing objects from images, detecting spam emails, and recommending items on retail websites, the most successful current approaches involve learning rich prediction rules from large datasets. There are many challenges in these machine learning tasks. For example, as the size of the datasets and the complexity of these prediction rules increase, there is a significant challenge in designing scalable methods that can effectively exploit the availability of distributed computing units. As another example, in many machine learning applications, there can be data corruptions, communication errors, and even adversarial attacks during training and test. Therefore, to build reliable machine learning models, we also have to tackle the challenge of robustness in machine learning.In this dissertation, we study several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrate recent progress towards the ambitious goal of building more scalable and robust machine learning models. We start with the speedup saturation problem in distributed stochastic gradient descent (SGD) algorithms with large mini-batches. We introduce the notion of gradient diversity, a metric of the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We then move forward to Byzantine distributed learning, a topic that involves both scalability and robustness in distributed learning. In the Byzantine setting that we consider, a fraction of distributed worker machines can have arbitrary or even adversarial behavior. We design statistically and computationally efficient algorithms to defend against Byzantine failures in distributed optimization with convex and non-convex objectives. Lastly, we discuss the adversarial example phenomenon. We provide theoretical analysis of the adversarially robust generalization properties of machine learning models through the lens of Radamacher complexity

eScholarship - University of California

Byzantine Stochastic Gradient Descent

Author: Alistarh Dan-Adrian
Allen-Zhu Zeyuan
Bengio S.
Cesa-Bianchi N.
Garnett R.
Grauman K.
Larochelle H.
Li Jerry
Wallach H.
Publication venue
Publication date: 01/01/2018
Field of study

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the

m

machines which allegedly compute stochastic gradients every iteration, an

\alpha

-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds

\varepsilon

-approximate minimizers of convex functions in

T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big)

iterations. In contrast, traditional mini-batch SGD needs

T = O\big( \frac{1}{\varepsilon^2 m} \big)

iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture

Author: Barrak Amine
Jaafar Fehmi
Jaziri Mayssa
Petrillo Fabio
Trabelsi Ranim
Publication venue
Publication date: 25/09/2023
Field of study

The advent of serverless computing has ushered in notable advancements in distributed machine learning, particularly within parameter server-based architectures. Yet, the integration of serverless features within peer-to-peer (P2P) distributed networks remains largely uncharted. In this paper, we introduce SPIRT, a fault-tolerant, reliable, and secure serverless P2P ML training architecture. designed to bridge this existing gap. Capitalizing on the inherent robustness and reliability innate to P2P systems, SPIRT employs RedisAI for in-database operations, leading to an 82\% reduction in the time required for model updates and gradient averaging across a variety of models and batch sizes. This architecture showcases resilience against peer failures and adeptly manages the integration of new peers, thereby highlighting its fault-tolerant characteristics and scalability. Furthermore, SPIRT ensures secure communication between peers, enhancing the reliability of distributed machine learning tasks. Even in the face of Byzantine attacks, the system's robust aggregation algorithms maintain high levels of accuracy. These findings illuminate the promising potential of serverless architectures in P2P distributed machine learning, offering a significant stride towards the development of more efficient, scalable, and resilient applications

arXiv.org e-Print Archive