14,978 research outputs found
How to Round Subspaces: A New Spectral Clustering Algorithm
A basic problem in spectral clustering is the following. If a solution
obtained from the spectral relaxation is close to an integral solution, is it
possible to find this integral solution even though they might be in completely
different basis? In this paper, we propose a new spectral clustering algorithm.
It can recover a -partition such that the subspace corresponding to the span
of its indicator vectors is close to the original subspace in
spectral norm with being the minimum possible ( always).
Moreover our algorithm does not impose any restriction on the cluster sizes.
Previously, no algorithm was known which could find a -partition closer than
.
We present two applications for our algorithm. First one finds a disjoint
union of bounded degree expanders which approximate a given graph in spectral
norm. The second one is for approximating the sparsest -partition in a graph
where each cluster have expansion at most provided where is the eigenvalue of
Laplacian matrix. This significantly improves upon the previous algorithms,
which required .Comment: Appeared in SODA 201
Macrostate Data Clustering
We develop an effective nonhierarchical data clustering method using an
analogy to the dynamic coarse graining of a stochastic system. Analyzing the
eigensystem of an interitem transition matrix identifies fuzzy clusters
corresponding to the metastable macroscopic states (macrostates) of a diffusive
system. A "minimum uncertainty criterion" determines the linear transformation
from eigenvectors to cluster-defining window functions. Eigenspectrum gap and
cluster certainty conditions identify the proper number of clusters. The
physically motivated fuzzy representation and associated uncertainty analysis
distinguishes macrostate clustering from spectral partitioning methods.
Macrostate data clustering solves a variety of test cases that challenge other
methods.Comment: keywords: cluster analysis, clustering, pattern recognition, spectral
graph theory, dynamic eigenvectors, machine learning, macrostates,
classificatio
Poisson noise reduction with non-local PCA
Photon-limited imaging arises when the number of photons collected by a
sensor array is small relative to the number of detector elements. Photon
limitations are an important concern for many applications such as spectral
imaging, night vision, nuclear medicine, and astronomy. Typically a Poisson
distribution is used to model these observations, and the inherent
heteroscedasticity of the data combined with standard noise removal methods
yields significant artifacts. This paper introduces a novel denoising algorithm
for photon-limited images which combines elements of dictionary learning and
sparse patch-based representations of images. The method employs both an
adaptation of Principal Component Analysis (PCA) for Poisson noise and recently
developed sparsity-regularized convex optimization algorithms for
photon-limited images. A comprehensive empirical evaluation of the proposed
method helps characterize the performance of this approach relative to other
state-of-the-art denoising methods. The results reveal that, despite its
conceptual simplicity, Poisson PCA-based denoising appears to be highly
competitive in very low light regimes.Comment: erratum: Image man is wrongly name pepper in the journal versio
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …