3,901 research outputs found
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
We show how to approximate a data matrix with a much smaller
sketch that can be used to solve a general class of
constrained k-rank approximation problems to within error.
Importantly, this class of problems includes -means clustering and
unconstrained low rank approximation (i.e. principal component analysis). By
reducing data points to just dimensions, our methods generically
accelerate any exact, approximate, or heuristic algorithm for these ubiquitous
problems.
For -means dimensionality reduction, we provide relative
error results for many common sketching techniques, including random row
projection, column selection, and approximate SVD. For approximate principal
component analysis, we give a simple alternative to known algorithms that has
applications in the streaming setting. Additionally, we extend recent work on
column-based matrix reconstruction, giving column subsets that not only `cover'
a good subspace for \bv{A}, but can be used directly to compute this
subspace.
Finally, for -means clustering, we show how to achieve a
approximation by Johnson-Lindenstrauss projecting data points to just dimensions. This gives the first result that leverages the
specific structure of -means to achieve dimension independent of input size
and sublinear in
Infinite-dimensional diffusions as limits of random walks on partitions
The present paper originated from our previous study of the problem of
harmonic analysis on the infinite symmetric group. This problem leads to a
family {P_z} of probability measures, the z-measures, which depend on the
complex parameter z. The z-measures live on the Thoma simplex, an
infinite-dimensional compact space which is a kind of dual object to the
infinite symmetric group. The aim of the paper is to introduce stochastic
dynamics related to the z-measures. Namely, we construct a family of diffusion
processes in the Toma simplex indexed by the same parameter z. Our diffusions
are obtained from certain Markov chains on partitions of natural numbers n in a
scaling limit as n goes to infinity. These Markov chains arise in a natural
way, due to the approximation of the infinite symmetric group by the increasing
chain of the finite symmetric groups. Each z-measure P_z serves as a unique
invariant distribution for the corresponding diffusion process, and the process
is ergodic with respect to P_z. Moreover, P_z is a symmetrizing measure, so
that the process is reversible. We describe the spectrum of its generator and
compute the associated (pre)Dirichlet form.Comment: AMSTex, 33 pages. Version 2: minor changes, typos corrected, to
appear in Prob. Theor. Rel. Field
Lattice Gauge Fields Topology Uncovered by Quaternionic sigma-model Embedding
We investigate SU(2) gauge fields topology using new approach, which exploits
the well known connection between SU(2) gauge theory and quaternionic
projective sigma-models and allows to formulate the topological charge density
entirely in terms of sigma-model fields. The method is studied in details and
for thermalized vacuum configurations is shown to be compatible with
overlap-based definition. We confirm that the topological charge is distributed
in localized four dimensional regions which, however, are not compatible with
instantons. Topological density bulk distribution is investigated at different
lattice spacings and is shown to possess some universal properties.Comment: revtex4, 19 pages (24 ps figures included); replaced to match the
published version, to appear in PRD; minor changes, references adde
- …