11 research outputs found

    Interpolating the Trace of the Inverse of Matrix A+tB\mathbf{A} + t \mathbf{B}

    Full text link
    We develop heuristic interpolation methods for the function ttrace((A+tB)1)t \mapsto \operatorname{trace}\left( (\mathbf{A} + t \mathbf{B})^{-1} \right), where the matrices A\mathbf{A} and B\mathbf{B} are symmetric and positive definite and tt is a real variable. This function is featured in many applications in statistics, machine learning, and computational physics. The presented interpolation functions are based on the modification of a sharp upper bound that we derive for this function, which is a new trace inequality for matrices. We demonstrate the accuracy and performance of the proposed method with numerical examples, namely, the marginal maximum likelihood estimation for linear Gaussian process regression and the estimation of the regularization parameter of ridge regression with the generalized cross-validation method

    Analysis of stochastic probing methods for estimating the trace of functions of sparse symmetric matrices

    Full text link
    We consider the problem of estimating the trace of a matrix function f(A)f(A). In certain situations, in particular if f(A)f(A) cannot be well approximated by a low-rank matrix, combining probing methods based on graph colorings with stochastic trace estimation techniques can yield accurate approximations at moderate cost. So far, such methods have not been thoroughly analyzed, though, but were rather used as efficient heuristics by practitioners. In this manuscript, we perform a detailed analysis of stochastic probing methods and, in particular, expose conditions under which the expected approximation error in the stochastic probing method scales more favorably with the dimension of the matrix than the error in non-stochastic probing. Extending results from [E. Aune, D. P. Simpson, J. Eidsvik, Parameter estimation in high dimensional Gaussian distributions, Stat. Comput., 24, pp. 247--263, 2014], we also characterize situations in which using just one stochastic vector is always -- not only in expectation -- better than the deterministic probing method. Several numerical experiments illustrate our theory and compare with existing methods

    Randomized low-rank approximation of monotone matrix functions

    Full text link
    This work is concerned with computing low-rank approximations of a matrix function f(A)f(A) for a large symmetric positive semi-definite matrix AA, a task that arises in, e.g., statistical learning and inverse problems. The application of popular randomized methods, such as the randomized singular value decomposition or the Nystr\"om approximation, to f(A)f(A) requires multiplying f(A)f(A) with a few random vectors. A significant disadvantage of such an approach, matrix-vector products with f(A)f(A) are considerably more expensive than matrix-vector products with AA, even when carried out only approximately via, e.g., the Lanczos method. In this work, we present and analyze funNystr\"om, a simple and inexpensive method that constructs a low-rank approximation of f(A)f(A) directly from a Nystr\"om approximation of AA, completely bypassing the need for matrix-vector products with f(A)f(A). It is sensible to use funNystr\"om whenever ff is monotone and satisfies f(0)=0f(0) = 0. Under the stronger assumption that ff is operator monotone, which includes the matrix square root A1/2A^{1/2} and the matrix logarithm log(I+A)\log(I+A), we derive probabilistic bounds for the error in the Frobenius, nuclear, and operator norms. These bounds confirm the numerical observation that funNystr\"om tends to return an approximation that compares well with the best low-rank approximation of f(A)f(A). Our method is also of interest when estimating quantities associated with f(A)f(A), such as the trace or the diagonal entries of f(A)f(A). In particular, we propose and analyze funNystr\"om++, a combination of funNystr\"om with the recently developed Hutch++ method for trace estimation

    XTrace: Making the most of every sample in stochastic trace estimation

    Full text link
    The implicit trace estimation problem asks for an approximation of the trace of a square matrix, accessed via matrix-vector products (matvecs). This paper designs new randomized algorithms, XTrace and XNysTrace, for the trace estimation problem by exploiting both variance reduction and the exchangeability principle. For a fixed budget of matvecs, numerical experiments show that the new methods can achieve errors that are orders of magnitude smaller than existing algorithms, such as the Girard-Hutchinson estimator or the Hutch++ estimator. A theoretical analysis confirms the benefits by offering a precise description of the performance of these algorithms as a function of the spectrum of the input matrix. The paper also develops an exchangeable estimator, XDiag, for approximating the diagonal of a square matrix using matvecs.Comment: 31 pages, 8 figure

    Optimal Query Complexities for Dynamic Trace Estimation

    Full text link
    We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly, such as during an optimization process. Specifically, for any mm matrices A1,...,AmA_1,...,A_m with consecutive differences bounded in Schatten-11 norm by α\alpha, we provide a novel binary tree summation procedure that simultaneously estimates all mm traces up to ϵ\epsilon error with δ\delta failure probability with an optimal query complexity of O~(mαlog(1/δ)/ϵ+mlog(1/δ))\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right), improving the dependence on both α\alpha and δ\delta from Dharangutte and Musco (NeurIPS, 2021). Our procedure works without additional norm bounds on AiA_i and can be generalized to a bound for the pp-th Schatten norm for p[1,2]p \in [1,2], giving a complexity of O~(mα(log(1/δ)/ϵ)p+mlog(1/δ))\widetilde{O}\left(m \alpha\left(\sqrt{\log(1/\delta)}/\epsilon\right)^p +m \log(1/\delta)\right). By using novel reductions to communication complexity and information-theoretic analyses of Gaussian matrices, we provide matching lower bounds for static and dynamic trace estimation in all relevant parameters, including the failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation, resolving open questions of prior work.Comment: 30 page

    Krylov-aware stochastic trace estimation

    Full text link
    We introduce an algorithm for estimating the trace of a matrix function f(A)f(\mathbf{A}) using implicit products with a symmetric matrix A\mathbf{A}. Existing methods for implicit trace estimation of a matrix function tend to treat matrix-vector products with f(A)f(\mathbf{A}) as a black-box to be computed by a Krylov subspace method. Like other recent algorithms for implicit trace estimation, our approach is based on a combination of deflation and stochastic trace estimation. However, we take a closer look at how products with f(A)f(\mathbf{A}) are integrated into these approaches which enables several efficiencies not present in previously studied methods. In particular, we describe a Krylov subspace method for computing a low-rank approximation of a matrix function by a computationally efficient projection onto Krylov subspace.Comment: Figure 5.1 differs somewhat from the published version due to a clerical error made when uploading the images to the journa

    FAST AND MEMORY EFFICIENT ALGORITHMS FOR STRUCTURED MATRIX SPECTRUM APPROXIMATION

    Get PDF
    Approximating the singular values or eigenvalues of a matrix, i.e. spectrum approximation, is a fundamental task in data science and machine learning applications. While approximation of the top singular values has received considerable attention in numerical linear algebra, provably efficient algorithms for other spectrum approximation tasks such as spectral-sum estimation and spectrum density estimation are starting to emerge only recently. Two crucial components that have enabled efficient algorithms for spectrum approximation are access to randomness and structure in the underlying matrix. In this thesis, we study how randomization and the underlying structure of the matrix can be exploited to design fast and memory efficient algorithms for spectral sum-estimation and spectrum density estimation. In particular, we look at two classes of structure: sparsity and graph structure. In the first part of this thesis, we show that sparsity can be exploited to give low-memory algorithms for spectral summarization tasks such as approximating some Schatten norms, the Estrada index and the logarithm of the determinant (log-det) of a sparse matrix. Surprisingly, we show that the space complexity of our algorithms are independent of the underlying dimension of the matrix. Complimenting our result for sparse matrices, we show that matrices that satisfy a certain smooth definition of sparsity, but potentially dense in the conventional sense, can be approximated in spectral-norm error by a truly sparse matrix. Our method is based on a simple sampling scheme that can be implemented in linear time. In the second part, we give the first truly sublinear time algorithm to approximate the spectral density of the (normalized) adjacency matrix of an undirected, unweighted graph in earth-mover distance. In addition to our sublinear time result, we give theoretical guarantees for a variant of the widely-used Kernel Polynomial Method and propose a new moment matching based method for spectrum density estimation of Hermitian matrices
    corecore