78,145 research outputs found

    Clustering Partially Observed Graphs via Convex Optimization

    Get PDF
    This paper considers the problem of clustering a partially observed unweighted graph---i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"---i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning Research (JMLR). Partial results appeared in International Conference on Machine Learning (ICML) 201

    Cosmological Constraints from Galaxy Clustering and the Mass-to-Number Ratio of Galaxy Clusters

    Full text link
    We place constraints on the average density (Omega_m) and clustering amplitude (sigma_8) of matter using a combination of two measurements from the Sloan Digital Sky Survey: the galaxy two-point correlation function, w_p, and the mass-to-galaxy-number ratio within galaxy clusters, M/N, analogous to cluster M/L ratios. Our w_p measurements are obtained from DR7 while the sample of clusters is the maxBCG sample, with cluster masses derived from weak gravitational lensing. We construct non-linear galaxy bias models using the Halo Occupation Distribution (HOD) to fit both w_p and M/N for different cosmological parameters. HOD models that match the same two-point clustering predict different numbers of galaxies in massive halos when Omega_m or sigma_8 is varied, thereby breaking the degeneracy between cosmology and bias. We demonstrate that this technique yields constraints that are consistent and competitive with current results from cluster abundance studies, even though this technique does not use abundance information. Using w_p and M/N alone, we find Omega_m^0.5*sigma_8=0.465+/-0.026, with individual constraints of Omega_m=0.29+/-0.03 and sigma_8=0.85+/-0.06. Combined with current CMB data, these constraints are Omega_m=0.290+/-0.016 and sigma_8=0.826+/-0.020. All errors are 1-sigma. The systematic uncertainties that the M/N technique are most sensitive to are the amplitude of the bias function of dark matter halos and the possibility of redshift evolution between the SDSS Main sample and the maxBCG sample. Our derived constraints are insensitive to the current level of uncertainties in the halo mass function and in the mass-richness relation of clusters and its scatter, making the M/N technique complementary to cluster abundances as a method for constraining cosmology with future galaxy surveys.Comment: 23 pages, submitted to Ap
    • …
    corecore