Search CORE

12 research outputs found

On the matrix square root via geometric optimization

Author: Sra Suvrit
Publication venue
Publication date: 16/12/2015
Field of study

This paper is triggered by the preprint "\emph{Computing Matrix Squareroot via Non Convex Local Search}" by Jain et al. (\textit{\textcolor{blue}{arXiv:1507.05854}}), which analyzes gradient-descent for computing the square root of a positive definite matrix. Contrary to claims of~\citet{jain2015}, our experiments reveal that Newton-like methods compute matrix square roots rapidly and reliably, even for highly ill-conditioned matrices and without requiring commutativity. We observe that gradient-descent converges very slowly primarily due to tiny step-sizes and ill-conditioning. We derive an alternative first-order method based on geodesic convexity: our method admits a transparent convergence analysis (

< 1

page), attains linear rate, and displays reliable convergence even for rank deficient problems. Though superior to gradient-descent, ultimately our method is also outperformed by a well-known scaled Newton method. Nevertheless, the primary value of our work is its conceptual value: it shows that for deriving gradient based methods for the matrix square root, \emph{the manifold geometric view of positive definite matrices can be much more advantageous than the Euclidean view}.Comment: 8 pages, 12 plots, this version contains several more references and more words about the rank-deficient cas

arXiv.org e-Print Archive

University of Wyoming

An Efficient Parallel Algorithm for Spectral Sparsification of Laplacian and SDDM Matrix Polynomials

Author: Jindal Gorav
Kolev Pavel
Publication venue
Publication date: 01/01/2015
Field of study

For "large" class

\mathcal{C}

of continuous probability density functions (p.d.f.), we demonstrate that for every

w\in\mathcal{C}

there is mixture of discrete Binomial distributions (MDBD) with

T\geq N\sqrt{\phi_{w}/\delta}

distinct Binomial distributions

B(\cdot,N)

that

\delta

-approximates a discretized p.d.f.

\widehat{w}(i/N)\triangleq w(i/N)/[\sum_{\ell=0}^{N}w(\ell/N)]

for all

i\in[3:N-3]

, where

\phi_{w}\geq\max_{x\in[0,1]}|w(x)|

. Also, we give two efficient parallel algorithms to find such MDBD. Moreover, we propose a sequential algorithm that on input MDBD with

N=2^k

for

k\in\mathbb{N}_{+}

that induces a discretized p.d.f.

\beta

B=D-M

that is either Laplacian or SDDM matrix and parameter

\epsilon\in(0,1)

, outputs in

\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT)

time a spectral sparsifier

D-\widehat{M}_{N} \approx_{\epsilon} D-D\sum_{i=0}^{N}\beta_{i}(D^{-1} M)^i

of a matrix-polynomial, where

\widehat{O}(\cdot)

notation hides

\mathrm{poly}(\log n,\log N)

factors. This improves the Cheng et al.'s [CCLPT15] algorithm whose run time is

\widehat{O}(\epsilon^{-2} m N^2 + NT)

. Furthermore, our algorithm is parallelizable and runs in work

\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT)

and depth

O(\log N\cdot\mathrm{poly}(\log n)+\log T)

. Our main algorithmic contribution is to propose the first efficient parallel algorithm that on input continuous p.d.f.

w\in\mathcal{C}

, matrix

B=D-M

as above, outputs a spectral sparsifier of matrix-polynomial whose coefficients approximate component-wise the discretized p.d.f.

\widehat{w}

. Our results yield the first efficient and parallel algorithm that runs in nearly linear work and poly-logarithmic depth and analyzes the long term behaviour of Markov chains in non-trivial settings. In addition, we strengthen the Spielman and Peng's [PS14] parallel SDD solver

arXiv.org e-Print Archive

MPG.PuRe

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

Author: Agarwal Nitin
Calandriello Daniele
Chaudhuri Kamalika
Cheng Dehua
Fan Rong-En
Levy Omer
Mikolov Tomas
Mikolov Tomas
Stergiou Stergios
Thomas
Tsoumakas Grigorios
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/06/2019
Field of study

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).Comment: 11 pages, in Proceedings of the Web Conference 2019 (WWW 19

arXiv.org e-Print Archive

Crossref

Inference via low-dimensional couplings

Author: Bigoni Daniele
Marzouk Youssef
Spantini Alessio
Publication venue
Publication date: 01/07/2018
Field of study

We investigate the low-dimensional structure of deterministic transformations between random variables, i.e., transport maps between probability measures. In the context of statistics and machine learning, these transformations can be used to couple a tractable "reference" measure (e.g., a standard Gaussian) with a target measure of interest. Direct simulation from the desired measure can then be achieved by pushing forward reference samples through the map. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of low-dimensional couplings, induced by transport maps that are sparse and/or decomposable. Our analysis not only facilitates the construction of transformations in high-dimensional settings, but also suggests new inference methodologies for continuous non-Gaussian graphical models. For instance, in the context of nonlinear state-space models, we describe new variational algorithms for filtering, smoothing, and sequential parameter inference. These algorithms can be understood as the natural generalization---to the non-Gaussian case---of the square-root Rauch-Tung-Striebel Gaussian smoother.Comment: 78 pages, 25 figure

arXiv.org e-Print Archive

DSpace@MIT