119 research outputs found
Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
The problems of random projections and sparse reconstruction have much in
common and individually received much attention. Surprisingly, until now they
progressed in parallel and remained mostly separate. Here, we employ new tools
from probability in Banach spaces that were successfully used in the context of
sparse reconstruction to advance on an open problem in random pojection. In
particular, we generalize and use an intricate result by Rudelson and Vershynin
for sparse reconstruction which uses Dudley's theorem for bounding Gaussian
processes. Our main result states that any set of real
vectors in dimensional space can be linearly mapped to a space of dimension
k=O(\log N\polylog(n)), while (1) preserving the pairwise distances among the
vectors to within any constant distortion and (2) being able to apply the
transformation in time on each vector. This improves on the best
known achieved by Ailon and Liberty and by Ailon and Chazelle.
The dependence in the distortion constant however is believed to be
suboptimal and subject to further investigation. For constant distortion, this
settles the open question posed by these authors up to a \polylog(n) factor
while considerably simplifying their constructions
Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma
The Kaczmarz method is an algorithm for finding the solution to an
overdetermined consistent system of linear equations Ax=b by iteratively
projecting onto the solution spaces. The randomized version put forth by
Strohmer and Vershynin yields provably exponential convergence in expectation,
which for highly overdetermined systems even outperforms the conjugate gradient
method. In this article we present a modified version of the randomized
Kaczmarz method which at each iteration selects the optimal projection from a
randomly chosen set, which in most cases significantly improves the convergence
rate. We utilize a Johnson-Lindenstrauss dimension reduction technique to keep
the runtime on the same order as the original randomized version, adding only
extra preprocessing time. We present a series of empirical studies which
demonstrate the remarkable acceleration in convergence to the solution using
this modified approach
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
Sparser Johnson-Lindenstrauss Transforms
We give two different and simple constructions for dimensionality reduction
in via linear mappings that are sparse: only an
-fraction of entries in each column of our embedding matrices
are non-zero to achieve distortion with high probability, while
still achieving the asymptotically optimal number of rows. These are the first
constructions to provide subconstant sparsity for all values of parameters,
improving upon previous works of Achlioptas (JCSS 2003) and Dasgupta, Kumar,
and Sarl\'{o}s (STOC 2010). Such distributions can be used to speed up
applications where dimensionality reduction is used.Comment: v6: journal version, minor changes, added Remark 23; v5: modified
abstract, fixed typos, added open problem section; v4: simplified section 4
by giving 1 analysis that covers both constructions; v3: proof of Theorem 25
in v2 was written incorrectly, now fixed; v2: Added another construction
achieving same upper bound, and added proof of near-tight lower bound for DKS
schem
On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation
We study classic streaming and sparse recovery problems using deterministic
linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the
latter also being known as l1-heavy hitters), norm estimation, and approximate
inner product. We focus on devising a fixed matrix A in R^{m x n} and a
deterministic recovery/estimation procedure which work for all possible input
vectors simultaneously. Our results improve upon existing work, the following
being our main contributions:
* A proof that linf/l1 sparse recovery and inner product estimation are
equivalent, and that incoherent matrices can be used to solve both problems.
Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log
n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms
by making use of the Fast Johnson-Lindenstrauss transform. Both our running
times and number of measurements improve upon previous work. We can also obtain
better error guarantees than previous work in terms of a smaller tail of the
input vector.
* A new lower bound for the number of linear measurements required to solve
l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are
required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where
x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude.
* A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of
measurements required to solve deterministic norm estimation, i.e., to recover
|x|_2 +/- eps|x|_1.
For all the problems we study, tight bounds are already known for the
randomized complexity from previous work, except in the case of l1/l1 sparse
recovery, where a nearly tight bound is known. Our work thus aims to study the
deterministic complexities of these problems
Improved analysis of the subsampled randomized Hadamard transform
This paper presents an improved analysis of a structured dimension-reduction
map called the subsampled randomized Hadamard transform. This argument
demonstrates that the map preserves the Euclidean geometry of an entire
subspace of vectors. The new proof is much simpler than previous approaches,
and it offers---for the first time---optimal constants in the estimate on the
number of dimensions required for the embedding.Comment: 8 pages. To appear, Advances in Adaptive Data Analysis, special issue
"Sparse Representation of Data and Images." v2--v4 include minor correction
- …