24 research outputs found
Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees
We consider a distributionally robust stochastic optimization problem and
formulate it as a stochastic two-level composition optimization problem with
the use of the mean--semideviation risk measure. In this setting, we consider a
single time-scale algorithm, involving two versions of the inner function value
tracking: linearized tracking of a continuously differentiable loss function,
and SPIDER tracking of a weakly convex loss function. We adopt the norm of the
gradient of the Moreau envelope as our measure of stationarity and show that
the sample complexity of is possible in both
cases, with only the constant larger in the second case. Finally, we
demonstrate the performance of our algorithm with a robust learning example and
a weakly convex, non-smooth regression example
Robust Accelerated Primal-Dual Methods for Computing Saddle Points
We consider strongly convex/strongly concave saddle point problems assuming
we have access to unbiased stochastic estimates of the gradients. We propose a
stochastic accelerated primal-dual (SAPD) algorithm and show that SAPD iterate
sequence, generated using constant primal-dual step sizes, linearly converges
to a neighborhood of the unique saddle point, where the size of the
neighborhood is determined by the asymptotic variance of the iterates.
Interpreting the asymptotic variance as a measure of robustness to gradient
noise, we obtain explicit characterizations of robustness in terms of SAPD
parameters and problem constants. Based on these characterizations, we develop
computationally tractable techniques for optimizing the SAPD parameters, i.e.,
the primal and dual step sizes, and the momentum parameter, to achieve a
desired trade-off between the convergence rate and robustness on the Pareto
curve. This allows SAPD to enjoy fast convergence properties while being robust
to noise as an accelerated method. We also show that SAPD admits convergence
guarantees for the gap metric with a variance term optimal up to a logarithmic
factor --which can be removed by employing a restarting strategy. Furthermore,
to our knowledge, our work is the first one showing an iteration complexity
result for the gap function on smooth SCSC problems without the bounded domain
assumption. Finally, we illustrate the efficiency of our approach on
distributionally robust logistic regression problems
High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method
We consider stochastic strongly-convex-strongly-concave (SCSC) saddle point
(SP) problems which frequently arise in applications ranging from
distributionally robust learning to game theory and fairness in machine
learning. We focus on the recently developed stochastic accelerated primal-dual
algorithm (SAPD), which admits optimal complexity in several settings as an
accelerated algorithm. We provide high probability guarantees for convergence
to a neighborhood of the saddle point that reflects accelerated convergence
behavior. We also provide an analytical formula for the limiting covariance
matrix of the iterates for a class of stochastic SCSC quadratic problems where
the gradient noise is additive and Gaussian. This allows us to develop lower
bounds for this class of quadratic problems which show that our analysis is
tight in terms of the high probability bound dependency to the parameters. We
also provide a risk-averse convergence analysis characterizing the
``Conditional Value at Risk'', the ``Entropic Value at Risk'', and the
-divergence of the distance to the saddle point, highlighting the
trade-offs between the bias and the risk associated with an approximate
solution obtained by terminating the algorithm at any iteration
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Recent studies have provided both empirical and theoretical evidence
illustrating that heavy tails can emerge in stochastic gradient descent (SGD)
in various scenarios. Such heavy tails potentially result in iterates with
diverging variance, which hinders the use of conventional convergence analysis
techniques that rely on the existence of the second-order moments. In this
paper, we provide convergence guarantees for SGD under a state-dependent and
heavy-tailed noise with a potentially infinite variance, for a class of
strongly convex objectives. In the case where the -th moment of the noise
exists for some , we first identify a condition on the Hessian,
coined '-positive (semi-)definiteness', that leads to an interesting
interpolation between positive semi-definite matrices () and diagonally
dominant matrices with non-negative diagonal entries (). Under this
condition, we then provide a convergence rate for the distance to the global
optimum in . Furthermore, we provide a generalized central limit theorem,
which shows that the properly scaled Polyak-Ruppert averaging converges weakly
to a multivariate -stable random vector. Our results indicate that even
under heavy-tailed noise with infinite variance, SGD can converge to the global
optimum without necessitating any modification neither to the loss function or
to the algorithm itself, as typically required in robust statistics. We
demonstrate the implications of our results to applications such as linear
regression and generalized linear models subject to heavy-tailed data