241,232 research outputs found

    Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

    Full text link
    We show how to approximate a data matrix A\mathbf{A} with a much smaller sketch A~\mathbf{\tilde A} that can be used to solve a general class of constrained k-rank approximation problems to within (1+ϵ)(1+\epsilon) error. Importantly, this class of problems includes kk-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k)O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For kk-means dimensionality reduction, we provide (1+ϵ)(1+\epsilon) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for kk-means clustering, we show how to achieve a (9+ϵ)(9+\epsilon) approximation by Johnson-Lindenstrauss projecting data points to just O(logk/ϵ2)O(\log k/\epsilon^2) dimensions. This gives the first result that leverages the specific structure of kk-means to achieve dimension independent of input size and sublinear in kk

    A new jet algorithm based on the k-means clustering for the reconstruction of heavy states from jets

    Full text link
    A jet algorithm based on the k-means clustering procedure is proposed which can be used for the invariant-mass reconstruction of heavy states decaying to hadronic jets. The proposed algorithm was tested by reconstructing E+ E- to ttbar to 6 jets and E+ E- to W+W- to 4 jets processes at \sqrt{s}=500 GeV using a Monte Carlo simulation. It was shown that the algorithm has a reconstruction efficiency similar to traditional jet-finding algorithms, and leads to 25% and 40% reduction of reconstruction width for top quarks and W bosons, respectively, compared to the kT (Durham) algorithm. In addition, it is expected that the peak positions measured with the new algorithm have smaller systematical uncertainty.Comment: 11 pages, 3 eps figures (Eur. Phys. J. C, in press
    corecore