32 research outputs found

    On adaptive stochastic heavy ball momentum for solving linear systems

    Full text link
    The stochastic heavy ball momentum (SHBM) method has gained considerable popularity as a scalable approach for solving large-scale optimization problems. However, one limitation of this method is its reliance on prior knowledge of certain problem parameters, such as singular values of a matrix. In this paper, we propose an adaptive variant of the SHBM method for solving stochastic problems that are reformulated from linear systems using user-defined distributions. Our adaptive SHBM (ASHBM) method utilizes iterative information to update the parameters, addressing an open problem in the literature regarding the adaptive learning of momentum parameters. We prove that our method converges linearly in expectation, with a better convergence rate compared to the basic method. Notably, we demonstrate that the deterministic version of our ASHBM algorithm can be reformulated as a variant of the conjugate gradient (CG) method, inheriting many of its appealing properties, such as finite-time convergence. Consequently, the ASHBM method can be further generalized to develop a brand-new framework of the stochastic CG (SCG) method for solving linear systems. Our theoretical results are supported by numerical experiments

    A semi-randomized and augmented Kaczmarz method with simple random sampling for large-scale inconsistent linear systems

    Full text link
    A greedy randomized augmented Kaczmarz (GRAK) method was proposed in [Z.-Z. Bai and W.-T. WU, SIAM J. Sci. Comput., 43 (2021), pp. A3892-A3911] for large and sparse inconsistent linear systems. However, one has to construct two new index sets via computing residual vector with respect to the augmented linear system in each iteration. Thus, the computational overhead of this method is large for extremely large-scale problems. Moreover, there is no reliable stopping criterion for this method. In this work, we are interested in solving large-scale sparse or dense inconsistent linear systems, and try to enhance the numerical performance of the GRAK method. First, we propose an accelerated greedy randomized augmented Kaczmarz method. Theoretical analysis indicates that it converges faster than the GRAK method under very weak assumptions. Second, in order to further release the overhead, we propose a semi-randomized augmented Kaczmarz method with simple random sampling. Third, to the best of our knowledge, there are no practical stopping criteria for all the randomized Kaczmarz-type methods till now. To fill-in this gap, we introduce a practical stopping criterion for Kaczmarz-type methods, and show its rationality from a theoretical point of view. Numerical experiments are performed on both real-world and synthetic data sets, which demonstrate the efficiency of the proposed methods and the effectiveness of our stopping criterion

    Check Yourself Before You WREK Yourself: Unpacking and Generalizing Randomized Extended Kaczmarz

    Get PDF
    Linear systems are fundamental in many areas of science and engineering. With the advent of computers there now exist extremely large linear systems that we are interested in. Such linear systems lend themselves to iterative methods. One such method is the family of algorithms called Randomized Kaczmarz methods.Among this family, there exists a Randomized Kaczmarz variant called RandomizedExtended Kaczmarz which solves for least squares solutions in inconsistent linear systems. Among Kaczmarz variants, Randomized Extended Kaczmarz is unique in that it modifies input system in a special way to solve for the least squares solution. In this work we unpack the geometry underlying Randomized Extended Kaczmarz(REK) by uniting proofs by Zouzias and Freris (2013) and Du (2018), leading to more insight about why REK works. We also provide novel proofs showing: that REK will converge with an alternative sequence of z updates, and giving a closed form for REK’s original z updates. Lastly we have done some work generalizing the ideas behind REK and QuantileRK (Haddock et al., 2020) to lay foundations for a new Randomized Kaczmarz variant called Weighted Randomized Extended Kaczmarz (WREK) which aim to solve weighted least squares problems with dynamic reweightings

    Sketch and project: randomized iterative methods for linear systems and inverting matrices

    Get PDF
    Probabilistic ideas and tools have recently begun to permeate into several fields where they had traditionally not played a major role, including fields such as numerical linear algebra and optimization. One of the key ways in which these ideas influence these fields is via the development and analysis of randomized algorithms for solving standard and new problems of these fields. Such methods are typically easier to analyze, and often lead to faster and/or more scalable and versatile methods in practice. This thesis explores the design and analysis of new randomized iterative methods for solving linear systems and inverting matrices. The methods are based on a novel sketch-and-project framework. By sketching we mean, to start with a difficult problem and then randomly generate a simple problem that contains all the solutions of the original problem. After sketching the problem, we calculate the next iterate by projecting our current iterate onto the solution space of the sketched problem. The starting point for this thesis is the development of an archetype randomized method for solving linear systems. Our method has six different but equivalent interpretations: sketch-and-project, constrain-and-approximate, random intersect, random linear solve, random update and random fixed point. By varying its two parameters – a positive definite matrix (defining geometry), and a random matrix (sampled in an i.i.d. fashion in each iteration) – we recover a comprehensive array of well known algorithms as special cases, including the randomized Kaczmarz method, randomized Newton method, randomized coordinate descent method and random Gaussian pursuit. We also naturally obtain variants of all these methods using blocks and importance sampling. However, our method allows for a much wider selection of these two parameters, which leads to a number of new specific methods. We prove exponential convergence of the expected norm of the error in a single theorem, from which existing complexity results for known variants can be obtained. However, we also give an exact formula for the evolution of the expected iterates, which allows us to give lower bounds on the convergence rate. We then extend our problem to that of finding the projection of given vector onto the solution space of a linear system. For this we develop a new randomized iterative algorithm: stochastic dual ascent (SDA). The method is dual in nature, and iteratively solves the dual of the projection problem. The dual problem is a non-strongly concave quadratic maximization problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully chosen point in a subspace spanned by the columns of a random matrix drawn independently from a fixed distribution. The distribution plays the role of a parameter of the method. Our complexity results hold for a wide family of distributions of random matrices, which opens the possibility to fine-tune the stochasticity of the method to particular applications. We prove that primal iterates associated with the dual process converge to the projection exponentially fast in expectation, and give a formula and an insightful lower bound for the convergence rate. We also prove that the same rate applies to dual function values, primal function values and the duality gap. Unlike traditional iterative methods, SDA converges under virtually no additional assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our lower bound improves as the rank of the system matrix drops. By mapping our dual algorithm to a primal process, we uncover that the SDA method is the dual method with respect to the sketch-and-project method from the previous chapter. Thus our new more general convergence results for SDA carry over to the sketch-and-project method and all its specializations (randomized Kaczmarz, randomized coordinate descent...etc). When our method specializes to a known algorithm, we either recover the best known rates, or improve upon them. Finally, we show that the framework can be applied to the distributed average consensus problem to obtain an array of new algorithms. The randomized gossip algorithm arises as a special case. In the final chapter, we extend our method for solving linear system to inverting matrices, and develop a family of methods with specialized variants that maintain symmetry or positive definiteness of the iterates. All the methods in the family converge globally and exponentially, with explicit rates. In special cases, we obtain stochastic block variants of several quasi-Newton updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB), Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Ours are the first stochastic versions of these updates shown to converge to an inverse of a fixed matrix. Through a dual viewpoint we uncover a fundamental link between quasi-Newton updates and approximate inverse preconditioning. Further, we develop an adaptive variant of the randomized block BFGS (AdaRBFGS), where we modify the distribution underlying the stochasticity of the method throughout the iterative process to achieve faster convergence. By inverting several matrices from varied applications, we demonstrate that AdaRBFGS is highly competitive when compared to the well established Newton-Schulz and approximate preconditioning methods. In particular, on large-scale problems our method outperforms the standard methods by orders of magnitude. The development of efficient methods for estimating the inverse of very large matrices is a much needed tool for preconditioning and variable metric methods in the big data era

    Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

    Get PDF
    Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed---either explicitly or implicitly---to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis
    corecore