Search CORE

342 research outputs found

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

Author: Chen Yuxin
Cheng Chen
Fan Jianqing
Publication venue
Publication date: 23/02/2020
Field of study

This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix

\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}

, yet only a randomly perturbed version

\mathbf{M}

is observed. The noise matrix

\mathbf{M}-\mathbf{M}^{\star}

is composed of zero-mean independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise, for example, when we have two independent samples for each entry of

\mathbf{M}^{\star}

and arrange them into an {\em asymmetric} data matrix

\mathbf{M}

. The aim is to estimate the leading eigenvalue and eigenvector of

\mathbf{M}^{\star}

. We demonstrate that the leading eigenvalue of the data matrix

\mathbf{M}

can be

O(\sqrt{n})

times more accurate --- up to some log factor --- than its (unadjusted) leading singular value in eigenvalue estimation. Further, the perturbation of any linear form of the leading eigenvector of

\mathbf{M}

--- say, entrywise eigenvector perturbation --- is provably well-controlled. This eigen-decomposition approach is fully adaptive to heteroscedasticity of noise without the need of careful bias correction or any prior knowledge about the noise variance. We also provide partial theory for the more general rank-

r

case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be beneficial.Comment: accepted to Annals of Statistics, 2020. 37 page

arXiv.org e-Print Archive

Recommended from our members

Foundations of Node Representation Learning

Author: Chanpuriya Sudhanshu
Publication venue: ScholarWorks@UMass Amherst
Publication date: 14/11/2023
Field of study

Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical \u27spectral\u27 embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations? We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance. To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of networks with bounded max degree or arboricity, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph

ScholarWorks@UMass Amherst

New Subset Selection Algorithms for Low Rank Approximation: Offline and Online

Author: Woodruff David P.
Yasuda Taisuke
Publication venue
Publication date: 18/04/2023
Field of study

Subset selection for the rank

k

approximation of an

n\times d

matrix

A

offers improvements in the interpretability of matrices, as well as a variety of computational savings. This problem is well-understood when the error measure is the Frobenius norm, with various tight algorithms known even in challenging models such as the online model, where an algorithm must select the column subset irrevocably when the columns arrive one by one. In contrast, for other matrix losses, optimal trade-offs between the subset size and approximation quality have not been settled, even in the offline setting. We give a number of results towards closing these gaps. In the offline setting, we achieve nearly optimal bicriteria algorithms in two settings. First, we remove a

\sqrt k

factor from a result of [SWZ19] when the loss function is any entrywise loss with an approximate triangle inequality and at least linear growth. Our result is tight for the

\ell_1

loss. We give a similar improvement for entrywise

\ell_p

losses for

p>2

, improving a previous distortion of

k^{1-1/p}

k^{1/2-1/p}

. Our results come from a technique which replaces the use of a well-conditioned basis with a slightly larger spanning set for which any vector can be expressed as a linear combination with small Euclidean norm. We show that this technique also gives the first oblivious

\ell_p

subspace embeddings for

1<p<2

with

\tilde O(d^{1/p})

distortion, which is nearly optimal and closes a long line of work. In the online setting, we give the first online subset selection algorithm for

\ell_p

subspace approximation and entrywise

\ell_p

low rank approximation by implementing sensitivity sampling online, which is challenging due to the sequential nature of sensitivity sampling. Our main technique is an online algorithm for detecting when an approximately optimal subspace changes substantially.Comment: To appear in STOC 2023; abstract shortene

arXiv.org e-Print Archive

Sublinear Time Eigenvalue Approximation via Random Sampling

Author: Bhattacharjee Rajarshi
Dexter Gregory
Drineas Petros
Musco Cameron
Ray Archan
Publication venue
Publication date: 21/07/2022
Field of study

We study the problem of approximating the eigenspectrum of a symmetric matrix

\mathbf A \in \mathbb{R}^{n \times n}

with bounded entries (i.e.,

\|\mathbf A\|_{\infty} \leq 1

). We present a simple sublinear time algorithm that approximates all eigenvalues of

\mathbf{A}

up to additive error

\pm \epsilon n

using those of a randomly sampled

\tilde {O}\left (\frac{\log^3 n}{\epsilon^3}\right ) \times \tilde O\left (\frac{\log^3 n}{\epsilon^3}\right )

principal submatrix. Our result can be viewed as a concentration bound on the complete eigenspectrum of a random submatrix, significantly extending known bounds on just the singular values (the magnitudes of the eigenvalues). We give improved error bounds of

\pm \epsilon \sqrt{\text{nnz}(\mathbf{A})}

and

\pm \epsilon \|\mathbf A\|_F

when the rows of

\mathbf A

can be sampled with probabilities proportional to their sparsities or their squared

\ell_2

norms respectively. Here

\text{nnz}(\mathbf{A})

is the number of non-zero entries in

\mathbf{A}

and

\|\mathbf A\|_F

is its Frobenius norm. Even for the strictly easier problems of approximating the singular values or testing the existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS '20), our results are the first that take advantage of non-uniform sampling to give improved error bounds. From a technical perspective, our results require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. Our non-uniform sampling bounds require a new algorithmic approach, which judiciously zeroes out entries of a randomly sampled submatrix to reduce variance, before computing the eigenvalues of that submatrix as estimates for those of

\mathbf A

. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.Comment: 58 pages, 4 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server