78,145 research outputs found
Clustering Partially Observed Graphs via Convex Optimization
This paper considers the problem of clustering a partially observed
unweighted graph---i.e., one where for some node pairs we know there is an edge
between them, for some others we know there is no edge, and for the remaining
we do not know whether or not there is an edge. We want to organize the nodes
into disjoint clusters so that there is relatively dense (observed)
connectivity within clusters, and sparse across clusters.
We take a novel yet natural approach to this problem, by focusing on finding
the clustering that minimizes the number of "disagreements"---i.e., the sum of
the number of (observed) missing edges within clusters, and (observed) present
edges across clusters. Our algorithm uses convex optimization; its basis is a
reduction of disagreement minimization to the problem of recovering an
(unknown) low-rank matrix and an (unknown) sparse matrix from their partially
observed sum. We evaluate the performance of our algorithm on the classical
Planted Partition/Stochastic Block Model. Our main theorem provides sufficient
conditions for the success of our algorithm as a function of the minimum
cluster size, edge density and observation probability; in particular, the
results characterize the tradeoff between the observation probability and the
edge density gap. When there are a constant number of clusters of equal size,
our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning
Research (JMLR). Partial results appeared in International Conference on
Machine Learning (ICML) 201
Cosmological Constraints from Galaxy Clustering and the Mass-to-Number Ratio of Galaxy Clusters
We place constraints on the average density (Omega_m) and clustering
amplitude (sigma_8) of matter using a combination of two measurements from the
Sloan Digital Sky Survey: the galaxy two-point correlation function, w_p, and
the mass-to-galaxy-number ratio within galaxy clusters, M/N, analogous to
cluster M/L ratios. Our w_p measurements are obtained from DR7 while the sample
of clusters is the maxBCG sample, with cluster masses derived from weak
gravitational lensing. We construct non-linear galaxy bias models using the
Halo Occupation Distribution (HOD) to fit both w_p and M/N for different
cosmological parameters. HOD models that match the same two-point clustering
predict different numbers of galaxies in massive halos when Omega_m or sigma_8
is varied, thereby breaking the degeneracy between cosmology and bias. We
demonstrate that this technique yields constraints that are consistent and
competitive with current results from cluster abundance studies, even though
this technique does not use abundance information. Using w_p and M/N alone, we
find Omega_m^0.5*sigma_8=0.465+/-0.026, with individual constraints of
Omega_m=0.29+/-0.03 and sigma_8=0.85+/-0.06. Combined with current CMB data,
these constraints are Omega_m=0.290+/-0.016 and sigma_8=0.826+/-0.020. All
errors are 1-sigma. The systematic uncertainties that the M/N technique are
most sensitive to are the amplitude of the bias function of dark matter halos
and the possibility of redshift evolution between the SDSS Main sample and the
maxBCG sample. Our derived constraints are insensitive to the current level of
uncertainties in the halo mass function and in the mass-richness relation of
clusters and its scatter, making the M/N technique complementary to cluster
abundances as a method for constraining cosmology with future galaxy surveys.Comment: 23 pages, submitted to Ap
- …