8 research outputs found
Communication-Efficient Network-Distributed Optimization with Differential-Coded Compressors
Network-distributed optimization has attracted significant attention in recent years due to its ever-increasing applications. However, the classic decentralized gradient descent (DGD) algorithm is communication-inefficient for large-scale and high-dimensional network-distributed optimization problems. To address this challenge, many compressed DGD-based algorithms have been proposed. However, most of the existing works have high complexity and assume compressors with bounded noise power. To overcome these limitations, in this paper, we propose a new differential-coded compressed DGD (DC-DGD) algorithm. The key features of DC-DGD include: i) DC-DGD works with general SNR-constrained compressors, relaxing the bounded noise power assumption; ii) The differential-coded design entails the same convergence rate as the original DGD algorithm; and iii) DC-DGD has the same low-complexity structure as the original DGD due to a {\em self-noise-reduction effect}. Moreover, the above features inspire us to develop a hybrid compression scheme that offers a systematic mechanism to minimize the communication cost. Finally, we conduct extensive experiments to verify the efficacy of the proposed DC-DGD and hybrid compressor
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent
Recently there are a considerable amount of work devoted to the study of the
algorithmic stability and generalization for stochastic gradient descent (SGD).
However, the existing stability analysis requires to impose restrictive
assumptions on the boundedness of gradients, strong smoothness and convexity of
loss functions. In this paper, we provide a fine-grained analysis of stability
and generalization for SGD by substantially relaxing these assumptions.
Firstly, we establish stability and generalization for SGD by removing the
existing bounded gradient assumptions. The key idea is the introduction of a
new stability measure called on-average model stability, for which we develop
novel bounds controlled by the risks of SGD iterates. This yields
generalization bounds depending on the behavior of the best model, and leads to
the first-ever-known fast bounds in the low-noise setting using stability
approach. Secondly, the smoothness assumption is relaxed by considering loss
functions with Holder continuous (sub)gradients for which we show that optimal
bounds are still achieved by balancing computation and stability. To our best
knowledge, this gives the first-ever-known stability and generalization bounds
for SGD with even non-differentiable loss functions. Finally, we study learning
problems with (strongly) convex objectives but non-convex loss functions.Comment: to appear in ICML 202
Theoretical Deep Learning
Deep learning has long been criticised as a black-box model for lacking sound theoretical explanation. During the PhD course, I explore and establish theoretical foundations for deep learning. In this thesis, I present my contributions positioned upon existing literature: (1) analysing the generalizability of the neural networks with residual connections via complexity and capacity-based hypothesis complexity measures; (2) modeling stochastic gradient descent (SGD) by stochastic differential equations (SDEs) and their dynamics, and further characterizing the generalizability of deep learning; (3) understanding the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems, which sheds light in reconciling the over-representation and excellent generalizability of deep learning; and (4) discovering the interplay between generalization, privacy preservation, and adversarial robustness, which have seen rising concerns in deep learning deployment