Search CORE

376 research outputs found

Estimating the inverse trace using random forests on graphs

Author: Amblard Pierre-Olivier
Avena Luca
Barthelmé Simon
Gaudillière Alexandre
Tremblay Nicolas
Publication venue
Publication date: 06/05/2019
Field of study

Some data analysis problems require the computation of (regularised) inverse traces, i.e. quantities of the form \Tr (q \bI + \bL)^{-1}. For large matrices, direct methods are unfeasible and one must resort to approximations, for example using a conjugate gradient solver combined with Girard's trace estimator (also known as Hutchinson's trace estimator). Here we describe an unbiased estimator of the regularized inverse trace, based on Wilson's algorithm, an algorithm that was initially designed to draw uniform spanning trees in graphs. Our method is fast, easy to implement, and scales to very large matrices. Its main drawback is that it is limited to diagonally dominant matrices \bL.Comment: Submitted to GRETSI conferenc

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL AMU

Lazy stochastic principal component analysis

Author: Li Li
Nguyen Dinh
Wojnowicz Michael
Zhao Xuan
Publication venue
Publication date: 21/09/2017
Field of study

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW

arXiv.org e-Print Archive

Crossref