526 research outputs found
Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1
Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest
Impossibility of dimension reduction in the nuclear norm
Let (the Schatten--von Neumann trace class) denote the Banach
space of all compact linear operators whose nuclear norm
is finite, where
are the singular values of . We prove that
for arbitrarily large there exists a subset
with that cannot be
embedded with bi-Lipschitz distortion into any -dimensional
linear subspace of . is not even a -Lipschitz
quotient of any subset of any -dimensional linear subspace of
. Thus, does not admit a dimension reduction
result \'a la Johnson and Lindenstrauss (1984), which complements the work of
Harrow, Montanaro and Short (2011) on the limitations of quantum dimension
reduction under the assumption that the embedding into low dimensions is a
quantum channel. Such a statement was previously known with
replaced by the Banach space of absolutely summable sequences via the
work of Brinkman and Charikar (2003). In fact, the above set can
be taken to be the same set as the one that Brinkman and Charikar considered,
viewed as a collection of diagonal matrices in . The challenge is
to demonstrate that cannot be faithfully realized in an arbitrary
low-dimensional subspace of , while Brinkman and Charikar
obtained such an assertion only for subspaces of that consist of
diagonal operators (i.e., subspaces of ). We establish this by proving
that the Markov 2-convexity constant of any finite dimensional linear subspace
of is at most a universal constant multiple of
Dimension Reduction Techniques for l_p (1<p<2), with Applications
For Euclidean space (l_2), there exists the powerful dimension reduction transform of Johnson and Lindenstrauss [Conf. in modern analysis and probability, AMS 1984], with a host of known applications. Here, we consider the problem of dimension reduction for all l_p spaces 1<p<2. Although strong lower bounds are known for dimension reduction in l_1, Ostrovsky and Rabani [JACM 2002] successfully circumvented these by presenting an l_1 embedding that maintains fidelity in only a bounded distance range, with applications to clustering and nearest neighbor search. However, their embedding techniques are specific to l_1 and do not naturally extend to other norms.
In this paper, we apply a range of advanced techniques and produce bounded range dimension reduction embeddings for all of 1<p<2, thereby demonstrating that the approach initiated by Ostrovsky and Rabani for l_1 can be extended to a much more general framework. We also obtain improved bounds in terms of the intrinsic dimensionality. As a result we achieve improved bounds for proximity problems including snowflake embeddings and clustering
Metric Embedding via Shortest Path Decompositions
We study the problem of embedding shortest-path metrics of weighted graphs
into spaces. We introduce a new embedding technique based on low-depth
decompositions of a graph via shortest paths. The notion of Shortest Path
Decomposition depth is inductively defined: A (weighed) path graph has shortest
path decomposition (SPD) depth . General graph has an SPD of depth if it
contains a shortest path whose deletion leads to a graph, each of whose
components has SPD depth at most . In this paper we give an
-distortion embedding for graphs of SPD
depth at most . This result is asymptotically tight for any fixed ,
while for it is tight up to second order terms.
As a corollary of this result, we show that graphs having pathwidth embed
into with distortion . For
, this improves over the best previous bound of Lee and Sidiropoulos that
was exponential in ; moreover, for other values of it gives the first
embeddings whose distortion is independent of the graph size . Furthermore,
we use the fact that planar graphs have SPD depth to give a new
proof that any planar graph embeds into with distortion . Our approach also gives new results for graphs with bounded treewidth,
and for graphs excluding a fixed minor
- β¦