5,598 research outputs found
Acceleration and restart for the randomized Bregman-Kaczmarz method
Optimizing strongly convex functions subject to linear constraints is a
fundamental problem with numerous applications. In this work, we propose a
block (accelerated) randomized Bregman-Kaczmarz method that only uses a block
of constraints in each iteration to tackle this problem. We consider a dual
formulation of this problem in order to deal in an efficient way with the
linear constraints. Using convex tools, we show that the corresponding dual
function satisfies the Polyak-Lojasiewicz (PL) property, provided that the
primal objective function is strongly convex and verifies additionally some
other mild assumptions. However, adapting the existing theory on coordinate
descent methods to our dual formulation can only give us sublinear convergence
results in the dual space. In order to obtain convergence results in some
criterion corresponding to the primal (original) problem, we transfer our
algorithm to the primal space, which combined with the PL property allows us to
get linear convergence rates. More specifically, we provide a theoretical
analysis of the convergence of our proposed method under different assumptions
on the objective and demonstrate in the numerical experiments its superior
efficiency and speed up compared to existing methods for the same problem
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
Proximal Gradient methods with Adaptive Subspace Sampling
Many applications in machine learning or signal processing involve nonsmooth
optimization problems. This nonsmoothness brings a low-dimensional structure to
the optimal solutions. In this paper, we propose a randomized proximal gradient
method harnessing this underlying structure. We introduce two key components:
i) a random subspace proximal gradient algorithm; ii) an identification-based
sampling of the subspaces. Their interplay brings a significant performance
improvement on typical learning problems in terms of dimensions explored
- …