342 research outputs found

    Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

    Full text link
    This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix Mβ‹†βˆˆRnΓ—n\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}, yet only a randomly perturbed version M\mathbf{M} is observed. The noise matrix Mβˆ’M⋆\mathbf{M}-\mathbf{M}^{\star} is composed of zero-mean independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise, for example, when we have two independent samples for each entry of M⋆\mathbf{M}^{\star} and arrange them into an {\em asymmetric} data matrix M\mathbf{M}. The aim is to estimate the leading eigenvalue and eigenvector of M⋆\mathbf{M}^{\star}. We demonstrate that the leading eigenvalue of the data matrix M\mathbf{M} can be O(n)O(\sqrt{n}) times more accurate --- up to some log factor --- than its (unadjusted) leading singular value in eigenvalue estimation. Further, the perturbation of any linear form of the leading eigenvector of M\mathbf{M} --- say, entrywise eigenvector perturbation --- is provably well-controlled. This eigen-decomposition approach is fully adaptive to heteroscedasticity of noise without the need of careful bias correction or any prior knowledge about the noise variance. We also provide partial theory for the more general rank-rr case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be beneficial.Comment: accepted to Annals of Statistics, 2020. 37 page

    New Subset Selection Algorithms for Low Rank Approximation: Offline and Online

    Full text link
    Subset selection for the rank kk approximation of an nΓ—dn\times d matrix AA offers improvements in the interpretability of matrices, as well as a variety of computational savings. This problem is well-understood when the error measure is the Frobenius norm, with various tight algorithms known even in challenging models such as the online model, where an algorithm must select the column subset irrevocably when the columns arrive one by one. In contrast, for other matrix losses, optimal trade-offs between the subset size and approximation quality have not been settled, even in the offline setting. We give a number of results towards closing these gaps. In the offline setting, we achieve nearly optimal bicriteria algorithms in two settings. First, we remove a k\sqrt k factor from a result of [SWZ19] when the loss function is any entrywise loss with an approximate triangle inequality and at least linear growth. Our result is tight for the β„“1\ell_1 loss. We give a similar improvement for entrywise β„“p\ell_p losses for p>2p>2, improving a previous distortion of k1βˆ’1/pk^{1-1/p} to k1/2βˆ’1/pk^{1/2-1/p}. Our results come from a technique which replaces the use of a well-conditioned basis with a slightly larger spanning set for which any vector can be expressed as a linear combination with small Euclidean norm. We show that this technique also gives the first oblivious β„“p\ell_p subspace embeddings for 1<p<21<p<2 with O~(d1/p)\tilde O(d^{1/p}) distortion, which is nearly optimal and closes a long line of work. In the online setting, we give the first online subset selection algorithm for β„“p\ell_p subspace approximation and entrywise β„“p\ell_p low rank approximation by implementing sensitivity sampling online, which is challenging due to the sequential nature of sensitivity sampling. Our main technique is an online algorithm for detecting when an approximately optimal subspace changes substantially.Comment: To appear in STOC 2023; abstract shortene

    Sublinear Time Eigenvalue Approximation via Random Sampling

    Get PDF
    We study the problem of approximating the eigenspectrum of a symmetric matrix A∈RnΓ—n\mathbf A \in \mathbb{R}^{n \times n} with bounded entries (i.e., βˆ₯Aβˆ₯βˆžβ‰€1\|\mathbf A\|_{\infty} \leq 1). We present a simple sublinear time algorithm that approximates all eigenvalues of A\mathbf{A} up to additive error Β±Ο΅n\pm \epsilon n using those of a randomly sampled O~(log⁑3nΟ΅3)Γ—O~(log⁑3nΟ΅3)\tilde {O}\left (\frac{\log^3 n}{\epsilon^3}\right ) \times \tilde O\left (\frac{\log^3 n}{\epsilon^3}\right ) principal submatrix. Our result can be viewed as a concentration bound on the complete eigenspectrum of a random submatrix, significantly extending known bounds on just the singular values (the magnitudes of the eigenvalues). We give improved error bounds of Β±Ο΅nnz(A)\pm \epsilon \sqrt{\text{nnz}(\mathbf{A})} and Β±Ο΅βˆ₯Aβˆ₯F\pm \epsilon \|\mathbf A\|_F when the rows of A\mathbf A can be sampled with probabilities proportional to their sparsities or their squared β„“2\ell_2 norms respectively. Here nnz(A)\text{nnz}(\mathbf{A}) is the number of non-zero entries in A\mathbf{A} and βˆ₯Aβˆ₯F\|\mathbf A\|_F is its Frobenius norm. Even for the strictly easier problems of approximating the singular values or testing the existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS '20), our results are the first that take advantage of non-uniform sampling to give improved error bounds. From a technical perspective, our results require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. Our non-uniform sampling bounds require a new algorithmic approach, which judiciously zeroes out entries of a randomly sampled submatrix to reduce variance, before computing the eigenvalues of that submatrix as estimates for those of A\mathbf A. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.Comment: 58 pages, 4 figure
    • …