9 research outputs found
MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks
Implementations of SGD on distributed systems create new vulnerabilities,
which can be identified and misused by one or more adversarial agents.
Recently, it has been shown that well-known Byzantine-resilient gradient
aggregation schemes are indeed vulnerable to informed attackers that can tailor
the attacks (Fang et al., 2020; Xie et al., 2020b). We introduce MixTailor, a
scheme based on randomization of the aggregation strategies that makes it
impossible for the attacker to be fully informed. Deterministic schemes can be
integrated into MixTailor on the fly without introducing any additional
hyperparameters. Randomization decreases the capability of a powerful adversary
to tailor its attacks, while the resulting randomized aggregation scheme is
still competitive in terms of performance. For both iid and non-iid settings,
we establish almost sure convergence guarantees that are both stronger and more
general than those available in the literature. Our empirical studies across
various datasets, attacks, and settings, validate our hypothesis and show that
MixTailor successfully defends when well-known Byzantine-tolerant schemes fail.Comment: To appear at the Transactions on Machine Learning Research (TMLR
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees
We consider monotone variational inequality (VI) problems in multi-GPU
settings where multiple processors/workers/clients have access to local
stochastic dual vectors. This setting includes a broad range of important
problems from distributed convex minimization to min-max and games.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not
been designed to be communication-efficient. To this end, we propose a
quantized generalized extra-gradient (Q-GenX), which is an unbiased and
adaptive compression method tailored to solve VIs. We provide an adaptive
step-size rule, which adapts to the respective noise profiles at hand and
achieve a fast rate of under relative noise, and an
order-optimal under absolute noise and show
distributed training accelerates convergence. Finally, we validate our
theoretical results by providing real-world experiments and training generative
adversarial networks on multiple GPUs.Comment: International Conference on Learning Representations (ICLR 2023
Federated Learning under Covariate Shifts with Generalization Guarantees
This paper addresses intra-client and inter-client covariate shifts in
federated learning (FL) with a focus on the overall generalization performance.
To handle covariate shifts, we formulate a new global model training paradigm
and propose Federated Importance-Weighted Empirical Risk Minimization (FTW-ERM)
along with improving density ratio matching methods without requiring perfect
knowledge of the supremum over true ratios. We also propose the
communication-efficient variant FITW-ERM with the same level of privacy
guarantees as those of classical ERM in FL. We theoretically show that FTW-ERM
achieves smaller generalization error than classical ERM under certain
settings. Experimental results demonstrate the superiority of FTW-ERM over
existing FL baselines in challenging imbalanced federated settings in terms of
data distribution shifts across clients.Comment: Published in Transactions on Machine Learning Research (TMLR
NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods
Resource Management and Interference Control in Distributed Multi-Tier and D2D Systems
In order to improve the capacity and spectrum efficiency of next generation wireless networks, multi-tier wireless networking and device-to-device (D2D) communication are widely considered as strong candidates for 5G. In this thesis, I have developed new theories and design guidelines to improve the performance of large-scale multi-tier and D2D networks by studying their resource optimization and interference management.
In the first part of this thesis, we study optimal power allocation for distributed relays in a multi-channel system with multiple source-destination pairs and an individual power budget for each relay. We focus on designing the optimal relay beamformers, aiming at minimizing per-relay power usage while meeting minimum signal-to-noise guarantees. Showing that strong Lagrange duality holds even for this non-convex problem, we solve it in the dual domain. Further, we investigate the effect of imperfect channel information by quantifying the performance loss due to either quantization error with limited feedback or channel estimation error.
In the second part of this thesis, we study optimal inter-cell interference control for distributed relays in a multi-channel system. We design optimal relay beamforming to minimize the maximum interference caused at the neighboring cells, while satisfying minimum signal-to-noise requirements and per-relay power constraints. Even though the problem is non-convex, we propose an iterative algorithm that provides a semi-closed-form solution. We extend this algorithm to the problem of maximizing the minimum signal-to-noise subject to some pre-determined maximum interference constraints at neighboring cells. In order to gain insight into designing this system in practice, we further study the received worst-case signal-to-interference-and-noise ratio versus the maximum interference target.
In the third part of this thesis, we consider D2D communication underlaid in a cellular system for uplink resource sharing. Under optimal cellular user (CU) receive beamforming, we jointly optimize the powers of CUs and D2D pairs for their sum rate maximization, while satisfying minimum quality-of-service (QoS) requirements and worst-case inter-cell interference limit in multiple neighboring cells. The formulated joint optimization problem is non-convex. We propose an approximate power control algorithm to maximize the sum rate and provide an upper bound on the performance loss by the proposed algorithm and conditions for its optimality.
We further extended the results of the third part in the fourth part of this thesis, where we jointly optimize the beam vector and the transmit powers of the CU and D2D transmitter under practical system settings. We consider a multi-cell scenario, where perfect channel information is available only for the direct channels from the CU and D2D to the base station. For other channels, only partial channel information is available. The uncertain channel information, the non-convex expected sum rate, and the various power, interference, and QoS constraints, lead to a challenging optimization problem. We propose an efficient robust power control algorithm based on a ratio-of-expectation approximation to maximize the expected sum rate, which is shown to give near-optimal performance by comparing it with an upper bound of the sum rate.Ph.D