5,490 research outputs found
Recommended from our members
DISTRIBUTED LEARNING ALGORITHMS: COMMUNICATION EFFICIENCY AND ERROR RESILIENCE
In modern day machine learning applications such as self-driving cars, recommender systems, robotics, genetics etc., the size of the training data has grown to the point that it has become essential to design distributed learning algorithms. A general framework for the distributed learning is \emph{data parallelism} where the data is distributed among the \emph{worker machines} for parallel processing and computation to speed up learning. With billions of devices such as cellphones, computers etc., the data is inherently distributed and stored locally in the users\u27 devices. Learning in this set up is popularly known as \emph{Federated Learning}. The speed-up due to distributed framework gets hindered by some fundamental problems such as straggler workers, communication bottleneck due to high communication overhead between workers and central server, adversarial failure popularly know as \emph{Byzantine failure}. In this thesis, we study and develop distributed algorithms that are error resilient and communication efficient.
First, we address the problem of straggler workers where the learning is delayed due to slow workers in the distributed setup. To mitigate the effect of the stragglers, we employ \textbf{LDPC} (low density parity check) code to encode the data and implement gradient descent algorithm in the distributed setup. Second, we present a family of vector quantization schemes \emph{vqSGD} (vector quantized Stochastic Gradient Descent ) that provides an asymptotic reduction in the communication cost with convergence guarantees in the first order distributed optimization. We also showed that \emph{vqSGD} provides strong privacy guarantee. Third, we address the problem of Byzantine failure together with communication-efficiency in the first order gradient descent algorithm. We consider a generic class of - approximate compressor for communication efficiency and employ a simple \emph{norm based thresholding} scheme to make the learning algorithm robust to Byzantine failures. We establish statistical error rate for non-convex smooth loss. Moreover, we analyze the compressed gradient descent algorithm with error feedback in a distributed setting and in the presence of Byzantine worker machines. Fourth, we employ the generic class of - approximate compressor to develop a communication efficient second order Newton-type algorithm and provide rate of convergence for smooth objective. Fifth, we propose \textbf{COMRADE} (COMmunication-efficient and Robust Approximate Distributed nEwton ), an iterative second order algorithm that is communication efficient as well as robust against Byzantine failures. Sixth, we propose a distributed \emph{cubic-regularized Newton } algorithm that can escape saddle points effectively for non-convex loss function and find a local minima . Furthermore, the proposed algorithm can resist the attack of the Byzantine machines, which may create \emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack
Recommended from our members
Towards More Scalable and Robust Machine Learning
For many data-intensive real-world applications, such as recognizing objects from images, detecting spam emails, and recommending items on retail websites, the most successful current approaches involve learning rich prediction rules from large datasets. There are many challenges in these machine learning tasks. For example, as the size of the datasets and the complexity of these prediction rules increase, there is a significant challenge in designing scalable methods that can effectively exploit the availability of distributed computing units. As another example, in many machine learning applications, there can be data corruptions, communication errors, and even adversarial attacks during training and test. Therefore, to build reliable machine learning models, we also have to tackle the challenge of robustness in machine learning.In this dissertation, we study several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrate recent progress towards the ambitious goal of building more scalable and robust machine learning models. We start with the speedup saturation problem in distributed stochastic gradient descent (SGD) algorithms with large mini-batches. We introduce the notion of gradient diversity, a metric of the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We then move forward to Byzantine distributed learning, a topic that involves both scalability and robustness in distributed learning. In the Byzantine setting that we consider, a fraction of distributed worker machines can have arbitrary or even adversarial behavior. We design statistically and computationally efficient algorithms to defend against Byzantine failures in distributed optimization with convex and non-convex objectives. Lastly, we discuss the adversarial example phenomenon. We provide theoretical analysis of the adversarially robust generalization properties of machine learning models through the lens of Radamacher complexity
Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in an
adversarial setting where, out of the machines which allegedly compute
stochastic gradients every iteration, an -fraction are Byzantine, and
can behave arbitrarily and adversarially. Our main result is a variant of
stochastic gradient descent (SGD) which finds -approximate
minimizers of convex functions in iterations. In contrast, traditional
mini-batch SGD needs iterations,
but cannot tolerate Byzantine failures. Further, we provide a lower bound
showing that, up to logarithmic factors, our algorithm is
information-theoretically optimal both in terms of sampling complexity and time
complexity
SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture
The advent of serverless computing has ushered in notable advancements in
distributed machine learning, particularly within parameter server-based
architectures. Yet, the integration of serverless features within peer-to-peer
(P2P) distributed networks remains largely uncharted. In this paper, we
introduce SPIRT, a fault-tolerant, reliable, and secure serverless P2P ML
training architecture. designed to bridge this existing gap.
Capitalizing on the inherent robustness and reliability innate to P2P
systems, SPIRT employs RedisAI for in-database operations, leading to an 82\%
reduction in the time required for model updates and gradient averaging across
a variety of models and batch sizes. This architecture showcases resilience
against peer failures and adeptly manages the integration of new peers, thereby
highlighting its fault-tolerant characteristics and scalability. Furthermore,
SPIRT ensures secure communication between peers, enhancing the reliability of
distributed machine learning tasks. Even in the face of Byzantine attacks, the
system's robust aggregation algorithms maintain high levels of accuracy. These
findings illuminate the promising potential of serverless architectures in P2P
distributed machine learning, offering a significant stride towards the
development of more efficient, scalable, and resilient applications
- …