252 research outputs found
A Spectral Gap Precludes Low-Dimensional Embeddings
We prove that if an n-vertex O(1)-expander embeds with average distortion D into a finite dimensional normed space X, then necessarily the dimension of X is at least n^{c/D} for some universal constant c>0. This is sharp up to the value of the constant c, and it improves over the previously best-known estimate dim(X)> c(log n)^2/D^2 of Linial, London and Rabinovich, strengthens a theorem of Matousek, and answers a question of Andoni, Nikolov, Razenshteyn and Waingarten
Impossibility of dimension reduction in the nuclear norm
Let (the Schatten--von Neumann trace class) denote the Banach
space of all compact linear operators whose nuclear norm
is finite, where
are the singular values of . We prove that
for arbitrarily large there exists a subset
with that cannot be
embedded with bi-Lipschitz distortion into any -dimensional
linear subspace of . is not even a -Lipschitz
quotient of any subset of any -dimensional linear subspace of
. Thus, does not admit a dimension reduction
result \'a la Johnson and Lindenstrauss (1984), which complements the work of
Harrow, Montanaro and Short (2011) on the limitations of quantum dimension
reduction under the assumption that the embedding into low dimensions is a
quantum channel. Such a statement was previously known with
replaced by the Banach space of absolutely summable sequences via the
work of Brinkman and Charikar (2003). In fact, the above set can
be taken to be the same set as the one that Brinkman and Charikar considered,
viewed as a collection of diagonal matrices in . The challenge is
to demonstrate that cannot be faithfully realized in an arbitrary
low-dimensional subspace of , while Brinkman and Charikar
obtained such an assertion only for subspaces of that consist of
diagonal operators (i.e., subspaces of ). We establish this by proving
that the Markov 2-convexity constant of any finite dimensional linear subspace
of is at most a universal constant multiple of
Recovering Structured Probability Matrices
We consider the problem of accurately recovering a matrix B of size M by M ,
which represents a probability distribution over M2 outcomes, given access to
an observed matrix of "counts" generated by taking independent samples from the
distribution B. How can structural properties of the underlying matrix B be
leveraged to yield computationally efficient and information theoretically
optimal reconstruction algorithms? When can accurate reconstruction be
accomplished in the sparse data regime? This basic problem lies at the core of
a number of questions that are currently being considered by different
communities, including building recommendation systems and collaborative
filtering in the sparse data regime, community detection in sparse random
graphs, learning structured models such as topic models or hidden Markov
models, and the efforts from the natural language processing community to
compute "word embeddings".
Our results apply to the setting where B has a low rank structure. For this
setting, we propose an efficient algorithm that accurately recovers the
underlying M by M matrix using Theta(M) samples. This result easily translates
to Theta(M) sample algorithms for learning topic models and learning hidden
Markov Models. These linear sample complexities are optimal, up to constant
factors, in an extremely strong sense: even testing basic properties of the
underlying matrix (such as whether it has rank 1 or 2) requires Omega(M)
samples. We provide an even stronger lower bound where distinguishing whether a
sequence of observations were drawn from the uniform distribution over M
observations versus being generated by an HMM with two hidden states requires
Omega(M) observations. This precludes sublinear-sample hypothesis tests for
basic properties, such as identity or uniformity, as well as sublinear sample
estimators for quantities such as the entropy rate of HMMs
Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective
Obtaining rigorous statistical guarantees for generalization under
distribution shift remains an open and active research area. We study a setting
we call combinatorial distribution shift, where (a) under the test- and
training-distributions, the labels are determined by pairs of features
, (b) the training distribution has coverage of certain marginal
distributions over and separately, but (c) the test distribution
involves examples from a product distribution over that is {not}
covered by the training distribution. Focusing on the special case where the
labels are given by bilinear embeddings into a Hilbert space : , we aim to
extrapolate to a test distribution domain that is covered in training,
i.e., achieving bilinear combinatorial extrapolation.
Our setting generalizes a special case of matrix completion from
missing-not-at-random data, for which all existing results require the
ground-truth matrices to be either exactly low-rank, or to exhibit very sharp
spectral cutoffs. In this work, we develop a series of theoretical results that
enable bilinear combinatorial extrapolation under gradual spectral decay as
observed in typical high-dimensional data, including novel algorithms,
generalization guarantees, and linear-algebraic results. A key tool is a novel
perturbation bound for the rank- singular value decomposition approximations
between two matrices that depends on the relative spectral gap rather than the
absolute spectral gap, a result that may be of broader independent interest.Comment: The 36th Annual Conference on Learning Theory (COLT 2023
- β¦