9 research outputs found

    MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks

    Full text link
    Implementations of SGD on distributed systems create new vulnerabilities, which can be identified and misused by one or more adversarial agents. Recently, it has been shown that well-known Byzantine-resilient gradient aggregation schemes are indeed vulnerable to informed attackers that can tailor the attacks (Fang et al., 2020; Xie et al., 2020b). We introduce MixTailor, a scheme based on randomization of the aggregation strategies that makes it impossible for the attacker to be fully informed. Deterministic schemes can be integrated into MixTailor on the fly without introducing any additional hyperparameters. Randomization decreases the capability of a powerful adversary to tailor its attacks, while the resulting randomized aggregation scheme is still competitive in terms of performance. For both iid and non-iid settings, we establish almost sure convergence guarantees that are both stronger and more general than those available in the literature. Our empirical studies across various datasets, attacks, and settings, validate our hypothesis and show that MixTailor successfully defends when well-known Byzantine-tolerant schemes fail.Comment: To appear at the Transactions on Machine Learning Research (TMLR

    Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

    Full text link
    We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors. This setting includes a broad range of important problems from distributed convex minimization to min-max and games. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. To this end, we propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs. We provide an adaptive step-size rule, which adapts to the respective noise profiles at hand and achieve a fast rate of O(1/T){\mathcal O}(1/T) under relative noise, and an order-optimal O(1/T){\mathcal O}(1/\sqrt{T}) under absolute noise and show distributed training accelerates convergence. Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.Comment: International Conference on Learning Representations (ICLR 2023

    Federated Learning under Covariate Shifts with Generalization Guarantees

    Full text link
    This paper addresses intra-client and inter-client covariate shifts in federated learning (FL) with a focus on the overall generalization performance. To handle covariate shifts, we formulate a new global model training paradigm and propose Federated Importance-Weighted Empirical Risk Minimization (FTW-ERM) along with improving density ratio matching methods without requiring perfect knowledge of the supremum over true ratios. We also propose the communication-efficient variant FITW-ERM with the same level of privacy guarantees as those of classical ERM in FL. We theoretically show that FTW-ERM achieves smaller generalization error than classical ERM under certain settings. Experimental results demonstrate the superiority of FTW-ERM over existing FL baselines in challenging imbalanced federated settings in terms of data distribution shifts across clients.Comment: Published in Transactions on Machine Learning Research (TMLR

    NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization

    Get PDF
    As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods

    Resource Management and Interference Control in Distributed Multi-Tier and D2D Systems

    No full text
    In order to improve the capacity and spectrum efficiency of next generation wireless networks, multi-tier wireless networking and device-to-device (D2D) communication are widely considered as strong candidates for 5G. In this thesis, I have developed new theories and design guidelines to improve the performance of large-scale multi-tier and D2D networks by studying their resource optimization and interference management. In the first part of this thesis, we study optimal power allocation for distributed relays in a multi-channel system with multiple source-destination pairs and an individual power budget for each relay. We focus on designing the optimal relay beamformers, aiming at minimizing per-relay power usage while meeting minimum signal-to-noise guarantees. Showing that strong Lagrange duality holds even for this non-convex problem, we solve it in the dual domain. Further, we investigate the effect of imperfect channel information by quantifying the performance loss due to either quantization error with limited feedback or channel estimation error. In the second part of this thesis, we study optimal inter-cell interference control for distributed relays in a multi-channel system. We design optimal relay beamforming to minimize the maximum interference caused at the neighboring cells, while satisfying minimum signal-to-noise requirements and per-relay power constraints. Even though the problem is non-convex, we propose an iterative algorithm that provides a semi-closed-form solution. We extend this algorithm to the problem of maximizing the minimum signal-to-noise subject to some pre-determined maximum interference constraints at neighboring cells. In order to gain insight into designing this system in practice, we further study the received worst-case signal-to-interference-and-noise ratio versus the maximum interference target. In the third part of this thesis, we consider D2D communication underlaid in a cellular system for uplink resource sharing. Under optimal cellular user (CU) receive beamforming, we jointly optimize the powers of CUs and D2D pairs for their sum rate maximization, while satisfying minimum quality-of-service (QoS) requirements and worst-case inter-cell interference limit in multiple neighboring cells. The formulated joint optimization problem is non-convex. We propose an approximate power control algorithm to maximize the sum rate and provide an upper bound on the performance loss by the proposed algorithm and conditions for its optimality. We further extended the results of the third part in the fourth part of this thesis, where we jointly optimize the beam vector and the transmit powers of the CU and D2D transmitter under practical system settings. We consider a multi-cell scenario, where perfect channel information is available only for the direct channels from the CU and D2D to the base station. For other channels, only partial channel information is available. The uncertain channel information, the non-convex expected sum rate, and the various power, interference, and QoS constraints, lead to a challenging optimization problem. We propose an efficient robust power control algorithm based on a ratio-of-expectation approximation to maximize the expected sum rate, which is shown to give near-optimal performance by comparing it with an upper bound of the sum rate.Ph.D
    corecore