    On the matrix square root via geometric optimization

    This paper is triggered by the preprint "\emph{Computing Matrix Squareroot via Non Convex Local Search}" by Jain et al. (\textit{\textcolor{blue}{arXiv:1507.05854}}), which analyzes gradient-descent for computing the square root of a positive definite matrix. Contrary to claims of~\citet{jain2015}, our experiments reveal that Newton-like methods compute matrix square roots rapidly and reliably, even for highly ill-conditioned matrices and without requiring commutativity. We observe that gradient-descent converges very slowly primarily due to tiny step-sizes and ill-conditioning. We derive an alternative first-order method based on geodesic convexity: our method admits a transparent convergence analysis (<1< 1 page), attains linear rate, and displays reliable convergence even for rank deficient problems. Though superior to gradient-descent, ultimately our method is also outperformed by a well-known scaled Newton method. Nevertheless, the primary value of our work is its conceptual value: it shows that for deriving gradient based methods for the matrix square root, \emph{the manifold geometric view of positive definite matrices can be much more advantageous than the Euclidean view}.Comment: 8 pages, 12 plots, this version contains several more references and more words about the rank-deficient cas

    An Efficient Parallel Algorithm for Spectral Sparsification of Laplacian and SDDM Matrix Polynomials

    For "large" class C\mathcal{C} of continuous probability density functions (p.d.f.), we demonstrate that for every w∈Cw\in\mathcal{C} there is mixture of discrete Binomial distributions (MDBD) with Tβ‰₯NΟ•w/Ξ΄T\geq N\sqrt{\phi_{w}/\delta} distinct Binomial distributions B(β‹…,N)B(\cdot,N) that Ξ΄\delta-approximates a discretized p.d.f. w^(i/N)β‰œw(i/N)/[βˆ‘β„“=0Nw(β„“/N)]\widehat{w}(i/N)\triangleq w(i/N)/[\sum_{\ell=0}^{N}w(\ell/N)] for all i∈[3:Nβˆ’3]i\in[3:N-3], where Ο•wβ‰₯max⁑x∈[0,1]∣w(x)∣\phi_{w}\geq\max_{x\in[0,1]}|w(x)|. Also, we give two efficient parallel algorithms to find such MDBD. Moreover, we propose a sequential algorithm that on input MDBD with N=2kN=2^k for k∈N+k\in\mathbb{N}_{+} that induces a discretized p.d.f. Ξ²\beta, B=Dβˆ’MB=D-M that is either Laplacian or SDDM matrix and parameter ϡ∈(0,1)\epsilon\in(0,1), outputs in O^(Ο΅βˆ’2m+Ο΅βˆ’4nT)\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT) time a spectral sparsifier Dβˆ’M^Nβ‰ˆΟ΅Dβˆ’Dβˆ‘i=0NΞ²i(Dβˆ’1M)iD-\widehat{M}_{N} \approx_{\epsilon} D-D\sum_{i=0}^{N}\beta_{i}(D^{-1} M)^i of a matrix-polynomial, where O^(β‹…)\widehat{O}(\cdot) notation hides poly(log⁑n,log⁑N)\mathrm{poly}(\log n,\log N) factors. This improves the Cheng et al.'s [CCLPT15] algorithm whose run time is O^(Ο΅βˆ’2mN2+NT)\widehat{O}(\epsilon^{-2} m N^2 + NT). Furthermore, our algorithm is parallelizable and runs in work O^(Ο΅βˆ’2m+Ο΅βˆ’4nT)\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT) and depth O(log⁑Nβ‹…poly(log⁑n)+log⁑T)O(\log N\cdot\mathrm{poly}(\log n)+\log T). Our main algorithmic contribution is to propose the first efficient parallel algorithm that on input continuous p.d.f. w∈Cw\in\mathcal{C}, matrix B=Dβˆ’MB=D-M as above, outputs a spectral sparsifier of matrix-polynomial whose coefficients approximate component-wise the discretized p.d.f. w^\widehat{w}. Our results yield the first efficient and parallel algorithm that runs in nearly linear work and poly-logarithmic depth and analyzes the long term behaviour of Markov chains in non-trivial settings. In addition, we strengthen the Spielman and Peng's [PS14] parallel SDD solver

    NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

    We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).Comment: 11 pages, in Proceedings of the Web Conference 2019 (WWW 19

    Inference via low-dimensional couplings

    We investigate the low-dimensional structure of deterministic transformations between random variables, i.e., transport maps between probability measures. In the context of statistics and machine learning, these transformations can be used to couple a tractable "reference" measure (e.g., a standard Gaussian) with a target measure of interest. Direct simulation from the desired measure can then be achieved by pushing forward reference samples through the map. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of low-dimensional couplings, induced by transport maps that are sparse and/or decomposable. Our analysis not only facilitates the construction of transformations in high-dimensional settings, but also suggests new inference methodologies for continuous non-Gaussian graphical models. For instance, in the context of nonlinear state-space models, we describe new variational algorithms for filtering, smoothing, and sequential parameter inference. These algorithms can be understood as the natural generalization---to the non-Gaussian case---of the square-root Rauch-Tung-Striebel Gaussian smoother.Comment: 78 pages, 25 figure