32 research outputs found
On adaptive stochastic heavy ball momentum for solving linear systems
The stochastic heavy ball momentum (SHBM) method has gained considerable
popularity as a scalable approach for solving large-scale optimization
problems. However, one limitation of this method is its reliance on prior
knowledge of certain problem parameters, such as singular values of a matrix.
In this paper, we propose an adaptive variant of the SHBM method for solving
stochastic problems that are reformulated from linear systems using
user-defined distributions. Our adaptive SHBM (ASHBM) method utilizes iterative
information to update the parameters, addressing an open problem in the
literature regarding the adaptive learning of momentum parameters. We prove
that our method converges linearly in expectation, with a better convergence
rate compared to the basic method. Notably, we demonstrate that the
deterministic version of our ASHBM algorithm can be reformulated as a variant
of the conjugate gradient (CG) method, inheriting many of its appealing
properties, such as finite-time convergence. Consequently, the ASHBM method can
be further generalized to develop a brand-new framework of the stochastic CG
(SCG) method for solving linear systems. Our theoretical results are supported
by numerical experiments
A semi-randomized and augmented Kaczmarz method with simple random sampling for large-scale inconsistent linear systems
A greedy randomized augmented Kaczmarz (GRAK) method was proposed in [Z.-Z.
Bai and W.-T. WU, SIAM J. Sci. Comput., 43 (2021), pp. A3892-A3911] for large
and sparse inconsistent linear systems. However, one has to construct two new
index sets via computing residual vector with respect to the augmented linear
system in each iteration. Thus, the computational overhead of this method is
large for extremely large-scale problems. Moreover, there is no reliable
stopping criterion for this method. In this work, we are interested in solving
large-scale sparse or dense inconsistent linear systems, and try to enhance the
numerical performance of the GRAK method. First, we propose an accelerated
greedy randomized augmented Kaczmarz method. Theoretical analysis indicates
that it converges faster than the GRAK method under very weak assumptions.
Second, in order to further release the overhead, we propose a semi-randomized
augmented Kaczmarz method with simple random sampling. Third, to the best of
our knowledge, there are no practical stopping criteria for all the randomized
Kaczmarz-type methods till now. To fill-in this gap, we introduce a practical
stopping criterion for Kaczmarz-type methods, and show its rationality from a
theoretical point of view. Numerical experiments are performed on both
real-world and synthetic data sets, which demonstrate the efficiency of the
proposed methods and the effectiveness of our stopping criterion
Check Yourself Before You WREK Yourself: Unpacking and Generalizing Randomized Extended Kaczmarz
Linear systems are fundamental in many areas of science and engineering. With the advent of computers there now exist extremely large linear systems that we are interested in. Such linear systems lend themselves to iterative methods. One such method is the family of algorithms called Randomized Kaczmarz methods.Among this family, there exists a Randomized Kaczmarz variant called RandomizedExtended Kaczmarz which solves for least squares solutions in inconsistent linear systems. Among Kaczmarz variants, Randomized Extended Kaczmarz is unique in that it modifies input system in a special way to solve for the least squares solution. In this work we unpack the geometry underlying Randomized Extended Kaczmarz(REK) by uniting proofs by Zouzias and Freris (2013) and Du (2018), leading to more insight about why REK works. We also provide novel proofs showing: that REK will converge with an alternative sequence of z updates, and giving a closed form for REK’s original z updates. Lastly we have done some work generalizing the ideas behind REK and QuantileRK (Haddock et al., 2020) to lay foundations for a new Randomized Kaczmarz variant called Weighted Randomized Extended Kaczmarz (WREK) which aim to solve weighted least squares problems with dynamic reweightings
Sketch and project: randomized iterative methods for linear systems and inverting matrices
Probabilistic ideas and tools have recently begun to permeate into several fields where they
had traditionally not played a major role, including fields such as numerical linear algebra
and optimization. One of the key ways in which these ideas influence these fields is via the
development and analysis of randomized algorithms for solving standard and new problems of
these fields. Such methods are typically easier to analyze, and often lead to faster and/or more
scalable and versatile methods in practice.
This thesis explores the design and analysis of new randomized iterative methods for
solving linear systems and inverting matrices. The methods are based on a novel sketch-and-project
framework. By sketching we mean, to start with a difficult problem and then randomly
generate a simple problem that contains all the solutions of the original problem. After sketching
the problem, we calculate the next iterate by projecting our current iterate onto the solution
space of the sketched problem.
The starting point for this thesis is the development of an archetype randomized method
for solving linear systems. Our method has six different but equivalent interpretations: sketch-and-project, constrain-and-approximate, random intersect, random linear solve, random update
and random fixed point. By varying its two parameters – a positive definite matrix (defining
geometry), and a random matrix (sampled in an i.i.d. fashion in each iteration) – we recover
a comprehensive array of well known algorithms as special cases, including the randomized
Kaczmarz method, randomized Newton method, randomized coordinate descent method and
random Gaussian pursuit. We also naturally obtain variants of all these methods using blocks
and importance sampling. However, our method allows for a much wider selection of these two
parameters, which leads to a number of new specific methods. We prove exponential convergence
of the expected norm of the error in a single theorem, from which existing complexity results
for known variants can be obtained. However, we also give an exact formula for the evolution
of the expected iterates, which allows us to give lower bounds on the convergence rate.
We then extend our problem to that of finding the projection of given vector onto the
solution space of a linear system. For this we develop a new randomized iterative algorithm:
stochastic dual ascent (SDA). The method is dual in nature, and iteratively solves the dual of
the projection problem. The dual problem is a non-strongly concave quadratic maximization
problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully
chosen point in a subspace spanned by the columns of a random matrix drawn independently
from a fixed distribution. The distribution plays the role of a parameter of the method. Our
complexity results hold for a wide family of distributions of random matrices, which opens the
possibility to fine-tune the stochasticity of the method to particular applications. We prove
that primal iterates associated with the dual process converge to the projection exponentially
fast in expectation, and give a formula and an insightful lower bound for the convergence rate.
We also prove that the same rate applies to dual function values, primal function values and the
duality gap. Unlike traditional iterative methods, SDA converges under virtually no additional
assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our
lower bound improves as the rank of the system matrix drops. By mapping our dual algorithm to
a primal process, we uncover that the SDA method is the dual method with respect to the sketch-and-project method from the previous chapter. Thus our new more general convergence results
for SDA carry over to the sketch-and-project method and all its specializations (randomized
Kaczmarz, randomized coordinate descent...etc). When our method specializes to a known
algorithm, we either recover the best known rates, or improve upon them. Finally, we show
that the framework can be applied to the distributed average consensus problem to obtain an
array of new algorithms. The randomized gossip algorithm arises as a special case.
In the final chapter, we extend our method for solving linear system to inverting matrices,
and develop a family of methods with specialized variants that maintain symmetry or positive
definiteness of the iterates. All the methods in the family converge globally and exponentially,
with explicit rates. In special cases, we obtain stochastic block variants of several quasi-Newton
updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB),
Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Ours are the
first stochastic versions of these updates shown to converge to an inverse of a fixed matrix.
Through a dual viewpoint we uncover a fundamental link between quasi-Newton updates and
approximate inverse preconditioning. Further, we develop an adaptive variant of the randomized
block BFGS (AdaRBFGS), where we modify the distribution underlying the stochasticity of
the method throughout the iterative process to achieve faster convergence. By inverting several
matrices from varied applications, we demonstrate that AdaRBFGS is highly competitive when
compared to the well established Newton-Schulz and approximate preconditioning methods. In
particular, on large-scale problems our method outperforms the standard methods by orders
of magnitude. The development of efficient methods for estimating the inverse of very large
matrices is a much needed tool for preconditioning and variable metric methods in the big data
era
Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions
Low-rank matrix approximations, such as the truncated singular value
decomposition and the rank-revealing QR decomposition, play a central role in
data analysis and scientific computing. This work surveys and extends recent
research which demonstrates that randomization offers a powerful tool for
performing low-rank matrix approximation. These techniques exploit modern
computational architectures more fully than classical methods and open the
possibility of dealing with truly massive data sets.
This paper presents a modular framework for constructing randomized
algorithms that compute partial matrix decompositions. These methods use random
sampling to identify a subspace that captures most of the action of a matrix.
The input matrix is then compressed---either explicitly or implicitly---to this
subspace, and the reduced matrix is manipulated deterministically to obtain the
desired low-rank factorization. In many cases, this approach beats its
classical competitors in terms of accuracy, speed, and robustness. These claims
are supported by extensive numerical experiments and a detailed error analysis