376 research outputs found
Estimating the inverse trace using random forests on graphs
Some data analysis problems require the computation of (regularised) inverse
traces, i.e. quantities of the form \Tr (q \bI + \bL)^{-1}. For large
matrices, direct methods are unfeasible and one must resort to approximations,
for example using a conjugate gradient solver combined with Girard's trace
estimator (also known as Hutchinson's trace estimator). Here we describe an
unbiased estimator of the regularized inverse trace, based on Wilson's
algorithm, an algorithm that was initially designed to draw uniform spanning
trees in graphs. Our method is fast, easy to implement, and scales to very
large matrices. Its main drawback is that it is limited to diagonally dominant
matrices \bL.Comment: Submitted to GRETSI conferenc
Lazy stochastic principal component analysis
Stochastic principal component analysis (SPCA) has become a popular
dimensionality reduction strategy for large, high-dimensional datasets. We
derive a simplified algorithm, called Lazy SPCA, which has reduced
computational complexity and is better suited for large-scale distributed
computation. We prove that SPCA and Lazy SPCA find the same approximations to
the principal subspace, and that the pairwise distances between samples in the
lower-dimensional space is invariant to whether SPCA is executed lazily or not.
Empirical studies find downstream predictive performance to be identical for
both methods, and superior to random projections, across a range of predictive
models (linear regression, logistic lasso, and random forests). In our largest
experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of
computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix
multiplications, besides an operation on a small square matrix whose size
depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining
Workshops (ICDMW
- …