Search CORE

252 research outputs found

A Spectral Gap Precludes Low-Dimensional Embeddings

Author: Naor Assaf
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Computational Geometry (SoCG 2017)
Publication date: 27/11/2016
Field of study

We prove that if an n-vertex O(1)-expander embeds with average distortion D into a finite dimensional normed space X, then necessarily the dimension of X is at least n^{c/D} for some universal constant c>0. This is sharp up to the value of the constant c, and it improves over the previously best-known estimate dim(X)> c(log n)^2/D^2 of Linial, London and Rabinovich, strengthens a theorem of Matousek, and answers a question of Andoni, Nikolov, Razenshteyn and Waingarten

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Impossibility of dimension reduction in the nuclear norm

Author: Naor Assaf
Pisier Gilles
Schechtman Gideon
Publication venue
Publication date: 24/10/2017
Field of study

Let

\mathsf{S}_1

(the Schatten--von Neumann trace class) denote the Banach space of all compact linear operators

T:\ell_2\to \ell_2

whose nuclear norm

\|T\|_{\mathsf{S}_1}=\sum_{j=1}^\infty\sigma_j(T)

is finite, where

\{\sigma_j(T)\}_{j=1}^\infty

are the singular values of

T

. We prove that for arbitrarily large

n\in \mathbb{N}

there exists a subset

\mathcal{C}\subseteq \mathsf{S}_1

with

|\mathcal{C}|=n

that cannot be embedded with bi-Lipschitz distortion

O(1)

into any

n^{o(1)}

-dimensional linear subspace of

\mathsf{S}_1

\mathcal{C}

is not even a

O(1)

-Lipschitz quotient of any subset of any

n^{o(1)}

-dimensional linear subspace of

\mathsf{S}_1

. Thus,

\mathsf{S}_1

does not admit a dimension reduction result \'a la Johnson and Lindenstrauss (1984), which complements the work of Harrow, Montanaro and Short (2011) on the limitations of quantum dimension reduction under the assumption that the embedding into low dimensions is a quantum channel. Such a statement was previously known with

\mathsf{S}_1

replaced by the Banach space

\ell_1

of absolutely summable sequences via the work of Brinkman and Charikar (2003). In fact, the above set

\mathcal{C}

can be taken to be the same set as the one that Brinkman and Charikar considered, viewed as a collection of diagonal matrices in

\mathsf{S}_1

. The challenge is to demonstrate that

\mathcal{C}

cannot be faithfully realized in an arbitrary low-dimensional subspace of

\mathsf{S}_1

, while Brinkman and Charikar obtained such an assertion only for subspaces of

\mathsf{S}_1

that consist of diagonal operators (i.e., subspaces of

\ell_1

). We establish this by proving that the Markov 2-convexity constant of any finite dimensional linear subspace

X

\mathsf{S}_1

is at most a universal constant multiple of

\sqrt{\log \mathrm{dim}(X)}

arXiv.org e-Print Archive

Crossref

Recovering Structured Probability Matrices

Author: Huang Qingqing
Kakade Sham M.
Kong Weihao
Valiant Gregory
Publication venue
Publication date: 01/01/2018
Field of study

We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient algorithm that accurately recovers the underlying M by M matrix using Theta(M) samples. This result easily translates to Theta(M) sample algorithms for learning topic models and learning hidden Markov Models. These linear sample complexities are optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. We provide an even stronger lower bound where distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by an HMM with two hidden states requires Omega(M) observations. This precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Author: Gupta Abhishek
Simchowitz Max
Zhang Kaiqing
Publication venue
Publication date: 13/07/2023
Field of study

Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels

z

are determined by pairs of features

(x,y)

, (b) the training distribution has coverage of certain marginal distributions over

x

and

y

separately, but (c) the test distribution involves examples from a product distribution over

(x,y)

that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space

H

\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}

, we aim to extrapolate to a test distribution domain that is

not

covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-

k

singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.Comment: The 36th Annual Conference on Learning Theory (COLT 2023

arXiv.org e-Print Archive