14,978 research outputs found

    How to Round Subspaces: A New Spectral Clustering Algorithm

    Full text link
    A basic problem in spectral clustering is the following. If a solution obtained from the spectral relaxation is close to an integral solution, is it possible to find this integral solution even though they might be in completely different basis? In this paper, we propose a new spectral clustering algorithm. It can recover a kk-partition such that the subspace corresponding to the span of its indicator vectors is O(opt)O(\sqrt{opt}) close to the original subspace in spectral norm with optopt being the minimum possible (opt≤1opt \le 1 always). Moreover our algorithm does not impose any restriction on the cluster sizes. Previously, no algorithm was known which could find a kk-partition closer than o(k⋅opt)o(k \cdot opt). We present two applications for our algorithm. First one finds a disjoint union of bounded degree expanders which approximate a given graph in spectral norm. The second one is for approximating the sparsest kk-partition in a graph where each cluster have expansion at most ϕk\phi_k provided ϕk≤O(λk+1)\phi_k \le O(\lambda_{k+1}) where λk+1\lambda_{k+1} is the (k+1)st(k+1)^{st} eigenvalue of Laplacian matrix. This significantly improves upon the previous algorithms, which required ϕk≤O(λk+1/k)\phi_k \le O(\lambda_{k+1}/k).Comment: Appeared in SODA 201

    Macrostate Data Clustering

    Full text link
    We develop an effective nonhierarchical data clustering method using an analogy to the dynamic coarse graining of a stochastic system. Analyzing the eigensystem of an interitem transition matrix identifies fuzzy clusters corresponding to the metastable macroscopic states (macrostates) of a diffusive system. A "minimum uncertainty criterion" determines the linear transformation from eigenvectors to cluster-defining window functions. Eigenspectrum gap and cluster certainty conditions identify the proper number of clusters. The physically motivated fuzzy representation and associated uncertainty analysis distinguishes macrostate clustering from spectral partitioning methods. Macrostate data clustering solves a variety of test cases that challenge other methods.Comment: keywords: cluster analysis, clustering, pattern recognition, spectral graph theory, dynamic eigenvectors, machine learning, macrostates, classificatio

    Poisson noise reduction with non-local PCA

    Full text link
    Photon-limited imaging arises when the number of photons collected by a sensor array is small relative to the number of detector elements. Photon limitations are an important concern for many applications such as spectral imaging, night vision, nuclear medicine, and astronomy. Typically a Poisson distribution is used to model these observations, and the inherent heteroscedasticity of the data combined with standard noise removal methods yields significant artifacts. This paper introduces a novel denoising algorithm for photon-limited images which combines elements of dictionary learning and sparse patch-based representations of images. The method employs both an adaptation of Principal Component Analysis (PCA) for Poisson noise and recently developed sparsity-regularized convex optimization algorithms for photon-limited images. A comprehensive empirical evaluation of the proposed method helps characterize the performance of this approach relative to other state-of-the-art denoising methods. The results reveal that, despite its conceptual simplicity, Poisson PCA-based denoising appears to be highly competitive in very low light regimes.Comment: erratum: Image man is wrongly name pepper in the journal versio

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
    • …
    corecore