3,901 research outputs found

    Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

    Full text link
    We show how to approximate a data matrix A\mathbf{A} with a much smaller sketch A~\mathbf{\tilde A} that can be used to solve a general class of constrained k-rank approximation problems to within (1+ϵ)(1+\epsilon) error. Importantly, this class of problems includes kk-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k)O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For kk-means dimensionality reduction, we provide (1+ϵ)(1+\epsilon) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for kk-means clustering, we show how to achieve a (9+ϵ)(9+\epsilon) approximation by Johnson-Lindenstrauss projecting data points to just O(logk/ϵ2)O(\log k/\epsilon^2) dimensions. This gives the first result that leverages the specific structure of kk-means to achieve dimension independent of input size and sublinear in kk

    Infinite-dimensional diffusions as limits of random walks on partitions

    Get PDF
    The present paper originated from our previous study of the problem of harmonic analysis on the infinite symmetric group. This problem leads to a family {P_z} of probability measures, the z-measures, which depend on the complex parameter z. The z-measures live on the Thoma simplex, an infinite-dimensional compact space which is a kind of dual object to the infinite symmetric group. The aim of the paper is to introduce stochastic dynamics related to the z-measures. Namely, we construct a family of diffusion processes in the Toma simplex indexed by the same parameter z. Our diffusions are obtained from certain Markov chains on partitions of natural numbers n in a scaling limit as n goes to infinity. These Markov chains arise in a natural way, due to the approximation of the infinite symmetric group by the increasing chain of the finite symmetric groups. Each z-measure P_z serves as a unique invariant distribution for the corresponding diffusion process, and the process is ergodic with respect to P_z. Moreover, P_z is a symmetrizing measure, so that the process is reversible. We describe the spectrum of its generator and compute the associated (pre)Dirichlet form.Comment: AMSTex, 33 pages. Version 2: minor changes, typos corrected, to appear in Prob. Theor. Rel. Field

    Lattice Gauge Fields Topology Uncovered by Quaternionic sigma-model Embedding

    Full text link
    We investigate SU(2) gauge fields topology using new approach, which exploits the well known connection between SU(2) gauge theory and quaternionic projective sigma-models and allows to formulate the topological charge density entirely in terms of sigma-model fields. The method is studied in details and for thermalized vacuum configurations is shown to be compatible with overlap-based definition. We confirm that the topological charge is distributed in localized four dimensional regions which, however, are not compatible with instantons. Topological density bulk distribution is investigated at different lattice spacings and is shown to possess some universal properties.Comment: revtex4, 19 pages (24 ps figures included); replaced to match the published version, to appear in PRD; minor changes, references adde
    corecore