342 research outputs found
Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices
This paper is concerned with the interplay between statistical asymmetry and
spectral methods. Suppose we are interested in estimating a rank-1 and
symmetric matrix , yet only a
randomly perturbed version is observed. The noise matrix
is composed of zero-mean independent (but not
necessarily homoscedastic) entries and is, therefore, not symmetric in general.
This might arise, for example, when we have two independent samples for each
entry of and arrange them into an {\em asymmetric} data
matrix . The aim is to estimate the leading eigenvalue and
eigenvector of . We demonstrate that the leading eigenvalue
of the data matrix can be times more accurate --- up
to some log factor --- than its (unadjusted) leading singular value in
eigenvalue estimation. Further, the perturbation of any linear form of the
leading eigenvector of --- say, entrywise eigenvector perturbation
--- is provably well-controlled. This eigen-decomposition approach is fully
adaptive to heteroscedasticity of noise without the need of careful bias
correction or any prior knowledge about the noise variance. We also provide
partial theory for the more general rank- case. The takeaway message is
this: arranging the data samples in an asymmetric manner and performing
eigen-decomposition could sometimes be beneficial.Comment: accepted to Annals of Statistics, 2020. 37 page
Recommended from our members
Foundations of Node Representation Learning
Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical \u27spectral\u27 embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations?
We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance.
To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of networks with bounded max degree or arboricity, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph
New Subset Selection Algorithms for Low Rank Approximation: Offline and Online
Subset selection for the rank approximation of an matrix
offers improvements in the interpretability of matrices, as well as a variety
of computational savings. This problem is well-understood when the error
measure is the Frobenius norm, with various tight algorithms known even in
challenging models such as the online model, where an algorithm must select the
column subset irrevocably when the columns arrive one by one. In contrast, for
other matrix losses, optimal trade-offs between the subset size and
approximation quality have not been settled, even in the offline setting. We
give a number of results towards closing these gaps.
In the offline setting, we achieve nearly optimal bicriteria algorithms in
two settings. First, we remove a factor from a result of [SWZ19] when
the loss function is any entrywise loss with an approximate triangle inequality
and at least linear growth. Our result is tight for the loss. We give
a similar improvement for entrywise losses for , improving a
previous distortion of to . Our results come from a
technique which replaces the use of a well-conditioned basis with a slightly
larger spanning set for which any vector can be expressed as a linear
combination with small Euclidean norm. We show that this technique also gives
the first oblivious subspace embeddings for with distortion, which is nearly optimal and closes a long line of work.
In the online setting, we give the first online subset selection algorithm
for subspace approximation and entrywise low rank
approximation by implementing sensitivity sampling online, which is challenging
due to the sequential nature of sensitivity sampling. Our main technique is an
online algorithm for detecting when an approximately optimal subspace changes
substantially.Comment: To appear in STOC 2023; abstract shortene
Sublinear Time Eigenvalue Approximation via Random Sampling
We study the problem of approximating the eigenspectrum of a symmetric matrix
with bounded entries (i.e., ). We present a simple sublinear time algorithm that
approximates all eigenvalues of up to additive error using those of a randomly sampled principal submatrix. Our result can be viewed as a concentration bound on
the complete eigenspectrum of a random submatrix, significantly extending known
bounds on just the singular values (the magnitudes of the eigenvalues). We give
improved error bounds of and when the rows of can be sampled with
probabilities proportional to their sparsities or their squared norms
respectively. Here is the number of non-zero entries
in and is its Frobenius norm. Even for the
strictly easier problems of approximating the singular values or testing the
existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS
'20), our results are the first that take advantage of non-uniform sampling to
give improved error bounds. From a technical perspective, our results require
several new eigenvalue concentration and perturbation bounds for matrices with
bounded entries. Our non-uniform sampling bounds require a new algorithmic
approach, which judiciously zeroes out entries of a randomly sampled submatrix
to reduce variance, before computing the eigenvalues of that submatrix as
estimates for those of . We complement our theoretical results with
numerical simulations, which demonstrate the effectiveness of our algorithms in
practice.Comment: 58 pages, 4 figure
- β¦