14 research outputs found

    Polynomial approximations for the matrix logarithm with computation graphs

    Full text link
    The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Pad\'e approximation, sometimes accompanied by the Schur decomposition. The main computational effort lies in matrix-matrix multiplications and left matrix division. In this work we illustrate that the number of such operations can be substantially reduced, by using a graph based representation of an efficient polynomial evaluation scheme. A technique to analyze the rounding error is proposed, and backward error analysis is adapted. We provide substantial simulations illustrating competitiveness both in terms of computation time and rounding errors

    An Improved Taylor Algorithm for Computing the Matrix Logarithm

    Full text link
    [EN] The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Pade approximation, sometimes accompanied by the Schur decomposition. In this work, we present a Taylor series algorithm, based on the free-transformation approach of the inverse scaling and squaring technique, that uses recent matrix polynomial formulas for evaluating the Taylor approximation of the matrix logarithm more efficiently than the Paterson-Stockmeyer method. Two MATLAB implementations of this algorithm, related to relative forward or backward error analysis, were developed and compared with different state-of-the art MATLAB functions. Numerical tests showed that the new implementations are generally more accurate than the previously available codes, with an intermediate execution time among all the codes in comparison.This research was funded by the European Regional Development Fund (ERDF) and the Spanish Ministerio de Economia y Competitividad Grant TIN2017-89314-P.Ibáñez González, JJ.; Sastre, J.; Ruíz Martínez, PA.; Alonso Abalos, JM.; Defez Candel, E. (2021). An Improved Taylor Algorithm for Computing the Matrix Logarithm. Mathematics. 9(17):1-19. https://doi.org/10.3390/math9172018S11991

    Optimal Query Complexities for Dynamic Trace Estimation

    Full text link
    We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly, such as during an optimization process. Specifically, for any mm matrices A1,...,AmA_1,...,A_m with consecutive differences bounded in Schatten-11 norm by α\alpha, we provide a novel binary tree summation procedure that simultaneously estimates all mm traces up to ϵ\epsilon error with δ\delta failure probability with an optimal query complexity of O~(mαlog(1/δ)/ϵ+mlog(1/δ))\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right), improving the dependence on both α\alpha and δ\delta from Dharangutte and Musco (NeurIPS, 2021). Our procedure works without additional norm bounds on AiA_i and can be generalized to a bound for the pp-th Schatten norm for p[1,2]p \in [1,2], giving a complexity of O~(mα(log(1/δ)/ϵ)p+mlog(1/δ))\widetilde{O}\left(m \alpha\left(\sqrt{\log(1/\delta)}/\epsilon\right)^p +m \log(1/\delta)\right). By using novel reductions to communication complexity and information-theoretic analyses of Gaussian matrices, we provide matching lower bounds for static and dynamic trace estimation in all relevant parameters, including the failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation, resolving open questions of prior work.Comment: 30 page

    Geometric variational inference

    Full text link
    Efficiently accessing the information contained in non-linear and high dimensional probability distributions remains a core challenge in modern statistics. Traditionally, estimators that go beyond point estimates are either categorized as Variational Inference (VI) or Markov-Chain Monte-Carlo (MCMC) techniques. While MCMC methods that utilize the geometric properties of continuous probability distributions to increase their efficiency have been proposed, VI methods rarely use the geometry. This work aims to fill this gap and proposes geometric Variational Inference (geoVI), a method based on Riemannian geometry and the Fisher information metric. It is used to construct a coordinate transformation that relates the Riemannian manifold associated with the metric to Euclidean space. The distribution, expressed in the coordinate system induced by the transformation, takes a particularly simple form that allows for an accurate variational approximation by a normal distribution. Furthermore, the algorithmic structure allows for an efficient implementation of geoVI which is demonstrated on multiple examples, ranging from low-dimensional illustrative ones to non-linear, hierarchical Bayesian inverse problems in thousands of dimensions.Comment: 42 pages, 18 figures, accepted by Entrop

    Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity

    Get PDF
    Dense kernel matrices ΘRN×N\Theta \in \mathbb{R}^{N \times N} obtained from point evaluations of a covariance function GG at locations {xi}1iNRd\{ x_{i} \}_{1 \leq i \leq N} \subset \mathbb{R}^{d} arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously distributed sampling points, we show how to identify a subset S{1,,N}2S \subset \{ 1 , \dots , N \}^2, with #S=O(Nlog(N)logd(N/ϵ))\# S = \mathcal{O} ( N \log (N) \log^{d} ( N /\epsilon ) ), such that the zero fill-in incomplete Cholesky factorization of the sparse matrix Θij1(i,j)S\Theta_{ij} \mathbf{1}_{( i, j ) \in S} is an ϵ\epsilon-approximation of Θ\Theta. This factorization can provably be obtained in complexity O(Nlog(N)logd(N/ϵ))\mathcal{O} ( N \log( N ) \log^{d}( N /\epsilon) ) in space and O(Nlog2(N)log2d(N/ϵ))\mathcal{O} ( N \log^{2}( N ) \log^{2d}( N /\epsilon) ) in time, improving upon the state of the art for general elliptic operators; we further present numerical evidence that dd can be taken to be the intrinsic dimension of the data set rather than that of the ambient space. The algorithm only needs to know the spatial configuration of the xix_{i} and does not require an analytic representation of GG. Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm. Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion, and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity O(Nlogd(N/ϵ))\mathcal{O} ( N \log^{d}( N /\epsilon) ) in space and O(Nlog2d(N/ϵ))\mathcal{O} ( N \log^{2d}( N /\epsilon) ) in time, improving upon the state of the art for general elliptic operators

    Probabilistic Linear Algebra

    Get PDF
    Linear algebra operations are at the core of many computational tasks. For example, evaluating the density of a multivariate normal distribution requires the solution of a linear equation system and the determinant of a square matrix. Frequently, and in particular in machine learning, the size of the involved matrices is too large to compute exact solutions, and necessitate approximation. Building upon recent work (Hennig and Kiefel 2012; Hennig 2015) this thesis considers numerical linear algebra from a probabilistic perspective. Part iii establishes connections between approximate linear solvers and Gaussian inference, with a focus on projection methods. One result is the observation that solution-based inference (Cockayne, Oates, Ipsen, and Girolami 2018) is subsumed in the matrix-based inference perspective (Hennig and Kiefel 2012). Part iv shows how the probabilistic viewpoint leads to a novel algorithm for kernel least-squares problems. A Gaussian model over kernel functions is proposed that uses matrix-multiplications computed by conjugate gradients to obtain a low-rank approximation of the kernel. The derived algorithm kernel machine conjugate gradients provides empirically better approximations than conjugate gradients and, when used for Gaussian process regression, additionally provides estimates for posterior variance and log-marginal likelihood, without the need to rerun. Part v is concerned with the approximation of kernel matrix determinants. Assuming the inputs to the kernel are independent and identically distributed, a stopping condition for the Cholesky decomposition is presented that provides probably approximately correct (PAC) estimates of the log-determinant with only little overhead
    corecore