Search CORE

266 research outputs found

Byzantine-Resilient Federated PCA and Low Rank Matrix Recovery

Author: Singh Ankit Pratap
Vaswani Namrata
Publication venue
Publication date: 25/09/2023
Field of study

In this work we consider the problem of estimating the principal subspace (span of the top r singular vectors) of a symmetric matrix in a federated setting, when each node has access to estimates of this matrix. We study how to make this problem Byzantine resilient. We introduce a novel provably Byzantine-resilient, communication-efficient, and private algorithm, called Subspace-Median, to solve it. We also study the most natural solution for this problem, a geometric median based modification of the federated power method, and explain why it is not useful. We consider two special cases of the resilient subspace estimation meta-problem - federated principal components analysis (PCA) and the spectral initialization step of horizontally federated low rank column-wise sensing (LRCCS) in this work. For both these problems we show how Subspace Median provides a resilient solution that is also communication-efficient. Median of Means extensions are developed for both problems. Extensive simulation experiments are used to corroborate our theoretical guarantees. Our second contribution is a complete AltGDmin based algorithm for Byzantine-resilient horizontally federated LRCCS and guarantees for it. We do this by developing a geometric median of means estimator for aggregating the partial gradients computed at each node, and using Subspace Median for initialization

arXiv.org e-Print Archive

Recommended from our members

DISTRIBUTED LEARNING ALGORITHMS: COMMUNICATION EFFICIENCY AND ERROR RESILIENCE

Author: Maity Raj Kumar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2022
Field of study

In modern day machine learning applications such as self-driving cars, recommender systems, robotics, genetics etc., the size of the training data has grown to the point that it has become essential to design distributed learning algorithms. A general framework for the distributed learning is \emph{data parallelism} where the data is distributed among the \emph{worker machines} for parallel processing and computation to speed up learning. With billions of devices such as cellphones, computers etc., the data is inherently distributed and stored locally in the users\u27 devices. Learning in this set up is popularly known as \emph{Federated Learning}. The speed-up due to distributed framework gets hindered by some fundamental problems such as straggler workers, communication bottleneck due to high communication overhead between workers and central server, adversarial failure popularly know as \emph{Byzantine failure}. In this thesis, we study and develop distributed algorithms that are error resilient and communication efficient. First, we address the problem of straggler workers where the learning is delayed due to slow workers in the distributed setup. To mitigate the effect of the stragglers, we employ \textbf{LDPC} (low density parity check) code to encode the data and implement gradient descent algorithm in the distributed setup. Second, we present a family of vector quantization schemes \emph{vqSGD} (vector quantized Stochastic Gradient Descent ) that provides an asymptotic reduction in the communication cost with convergence guarantees in the first order distributed optimization. We also showed that \emph{vqSGD} provides strong privacy guarantee. Third, we address the problem of Byzantine failure together with communication-efficiency in the first order gradient descent algorithm. We consider a generic class of

\delta

- approximate compressor for communication efficiency and employ a simple \emph{norm based thresholding} scheme to make the learning algorithm robust to Byzantine failures. We establish statistical error rate for non-convex smooth loss. Moreover, we analyze the compressed gradient descent algorithm with error feedback in a distributed setting and in the presence of Byzantine worker machines. Fourth, we employ the generic class of

\delta

- approximate compressor to develop a communication efficient second order Newton-type algorithm and provide rate of convergence for smooth objective. Fifth, we propose \textbf{COMRADE} (COMmunication-efficient and Robust Approximate Distributed nEwton ), an iterative second order algorithm that is communication efficient as well as robust against Byzantine failures. Sixth, we propose a distributed \emph{cubic-regularized Newton } algorithm that can escape saddle points effectively for non-convex loss function and find a local minima . Furthermore, the proposed algorithm can resist the attack of the Byzantine machines, which may create \emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack

ScholarWorks@UMass Amherst

Recommended from our members

Towards More Scalable and Robust Machine Learning

Author: Yin Dong
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

For many data-intensive real-world applications, such as recognizing objects from images, detecting spam emails, and recommending items on retail websites, the most successful current approaches involve learning rich prediction rules from large datasets. There are many challenges in these machine learning tasks. For example, as the size of the datasets and the complexity of these prediction rules increase, there is a significant challenge in designing scalable methods that can effectively exploit the availability of distributed computing units. As another example, in many machine learning applications, there can be data corruptions, communication errors, and even adversarial attacks during training and test. Therefore, to build reliable machine learning models, we also have to tackle the challenge of robustness in machine learning.In this dissertation, we study several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrate recent progress towards the ambitious goal of building more scalable and robust machine learning models. We start with the speedup saturation problem in distributed stochastic gradient descent (SGD) algorithms with large mini-batches. We introduce the notion of gradient diversity, a metric of the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We then move forward to Byzantine distributed learning, a topic that involves both scalability and robustness in distributed learning. In the Byzantine setting that we consider, a fraction of distributed worker machines can have arbitrary or even adversarial behavior. We design statistically and computationally efficient algorithms to defend against Byzantine failures in distributed optimization with convex and non-convex objectives. Lastly, we discuss the adversarial example phenomenon. We provide theoretical analysis of the adversarially robust generalization properties of machine learning models through the lens of Radamacher complexity

eScholarship - University of California

Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization

Author: Almasi Hamidreza
Mishra Harsh
Ravi Sathya N.
Vamanan Balajee
Publication venue
Publication date: 24/09/2023
Field of study

Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a distributed setup is prone to Byzantine failures of individual nodes, components, and software. With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems. We define the quality of workers as reconstruction ratios

\in (0,1]

, and formulate aggregation as a Maximum Likelihood Estimation procedure using Beta densities. We show that the Regularized form of log-likelihood wrt subspace can be approximately solved using iterative least squares solver, and provide convergence guarantees using recent Convex Optimization landscape results. Our empirical findings demonstrate that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators. We evaluate our method in a distributed setup with a parameter server, and show simultaneous improvements in communication efficiency and accuracy across various tasks. The code is publicly available at https://github.com/hamidralmasi/FlagAggregato

arXiv.org e-Print Archive

Federated Over-Air Subspace Tracking from Incomplete and Corrupted Data

Author: Narayanamurthy Praneeth
Ramamoorthy Aditya
Vaswani Namrata
Publication venue
Publication date: 22/06/2021
Field of study

Subspace tracking (ST) with missing data (ST-miss) or outliers (Robust ST) or both (Robust ST-miss) has been extensively studied in the last many years. This work provides a new simple algorithm and guarantee for both ST with missing data (ST-miss) and RST-miss. Unlike past work on this topic, the algorithm is much simpler (uses fewer parameters) and the guarantee does not make the artificial assumption of piecewise constant subspace change, although it still handles that setting. Secondly, we extend our approach and its analysis to provably solving these problems when the raw data is federated and when the over-air data communication modality is used for information exchange between the

K

peer nodes and the center.Comment: New model, algorithm for centralized case; added algorithms to deal with sparse outliers; modified organization significantl

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)