9,284 research outputs found
Impact of regularization on Spectral Clustering
The performance of spectral clustering can be considerably improved via
regularization, as demonstrated empirically in Amini et. al (2012). Here, we
provide an attempt at quantifying this improvement through theoretical
analysis. Under the stochastic block model (SBM), and its extensions, previous
results on spectral clustering relied on the minimum degree of the graph being
sufficiently large for its good performance. By examining the scenario where
the regularization parameter is large we show that the minimum degree
assumption can potentially be removed. As a special case, for an SBM with two
blocks, the results require the maximum degree to be large (grow faster than
) as opposed to the minimum degree.
More importantly, we show the usefulness of regularization in situations
where not all nodes belong to well-defined clusters. Our results rely on a
`bias-variance'-like trade-off that arises from understanding the concentration
of the sample Laplacian and the eigen gap as a function of the regularization
parameter. As a byproduct of our bounds, we propose a data-driven technique
\textit{DKest} (standing for estimated Davis-Kahan bounds) for choosing the
regularization parameter. This technique is shown to work well through
simulations and on a real data set.Comment: 37 page
Optimal approximate matrix product in terms of stable rank
We prove, using the subspace embedding guarantee in a black box way, that one
can achieve the spectral norm guarantee for approximate matrix multiplication
with a dimensionality-reducing map having
rows. Here is the maximum stable rank, i.e. squared ratio of
Frobenius and operator norms, of the two matrices being multiplied. This is a
quantitative improvement over previous work of [MZ11, KVZ14], and is also
optimal for any oblivious dimensionality-reducing map. Furthermore, due to the
black box reliance on the subspace embedding property in our proofs, our
theorem can be applied to a much more general class of sketching matrices than
what was known before, in addition to achieving better bounds. For example, one
can apply our theorem to efficient subspace embeddings such as the Subsampled
Randomized Hadamard Transform or sparse subspace embeddings, or even with
subspace embedding constructions that may be developed in the future.
Our main theorem, via connections with spectral error matrix multiplication
shown in prior work, implies quantitative improvements for approximate least
squares regression and low rank approximation. Our main result has also already
been applied to improve dimensionality reduction guarantees for -means
clustering [CEMMP14], and implies new results for nonparametric regression
[YPW15].
We also separately point out that the proof of the "BSS" deterministic
row-sampling result of [BSS12] can be modified to show that for any matrices
of stable rank at most , one can achieve the spectral norm
guarantee for approximate matrix multiplication of by deterministically
sampling rows that can be found in polynomial
time. The original result of [BSS12] was for rank instead of stable rank. Our
observation leads to a stronger version of a main theorem of [KMST10].Comment: v3: minor edits; v2: fixed one step in proof of Theorem 9 which was
wrong by a constant factor (see the new Lemma 5 and its use; final theorem
unaffected
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
We show how to approximate a data matrix with a much smaller
sketch that can be used to solve a general class of
constrained k-rank approximation problems to within error.
Importantly, this class of problems includes -means clustering and
unconstrained low rank approximation (i.e. principal component analysis). By
reducing data points to just dimensions, our methods generically
accelerate any exact, approximate, or heuristic algorithm for these ubiquitous
problems.
For -means dimensionality reduction, we provide relative
error results for many common sketching techniques, including random row
projection, column selection, and approximate SVD. For approximate principal
component analysis, we give a simple alternative to known algorithms that has
applications in the streaming setting. Additionally, we extend recent work on
column-based matrix reconstruction, giving column subsets that not only `cover'
a good subspace for \bv{A}, but can be used directly to compute this
subspace.
Finally, for -means clustering, we show how to achieve a
approximation by Johnson-Lindenstrauss projecting data points to just dimensions. This gives the first result that leverages the
specific structure of -means to achieve dimension independent of input size
and sublinear in
Clustering Partially Observed Graphs via Convex Optimization
This paper considers the problem of clustering a partially observed
unweighted graph---i.e., one where for some node pairs we know there is an edge
between them, for some others we know there is no edge, and for the remaining
we do not know whether or not there is an edge. We want to organize the nodes
into disjoint clusters so that there is relatively dense (observed)
connectivity within clusters, and sparse across clusters.
We take a novel yet natural approach to this problem, by focusing on finding
the clustering that minimizes the number of "disagreements"---i.e., the sum of
the number of (observed) missing edges within clusters, and (observed) present
edges across clusters. Our algorithm uses convex optimization; its basis is a
reduction of disagreement minimization to the problem of recovering an
(unknown) low-rank matrix and an (unknown) sparse matrix from their partially
observed sum. We evaluate the performance of our algorithm on the classical
Planted Partition/Stochastic Block Model. Our main theorem provides sufficient
conditions for the success of our algorithm as a function of the minimum
cluster size, edge density and observation probability; in particular, the
results characterize the tradeoff between the observation probability and the
edge density gap. When there are a constant number of clusters of equal size,
our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning
Research (JMLR). Partial results appeared in International Conference on
Machine Learning (ICML) 201
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …