13,867 research outputs found

    Performance Analysis of Spectral Clustering on Compressed, Incomplete and Inaccurate Measurements

    Full text link
    Spectral clustering is one of the most widely used techniques for extracting the underlying global structure of a data set. Compressed sensing and matrix completion have emerged as prevailing methods for efficiently recovering sparse and partially observed signals respectively. We combine the distance preserving measurements of compressed sensing and matrix completion with the power of robust spectral clustering. Our analysis provides rigorous bounds on how small errors in the affinity matrix can affect the spectral coordinates and clusterability. This work generalizes the current perturbation results of two-class spectral clustering to incorporate multi-class clustering with k eigenvectors. We thoroughly track how small perturbation from using compressed sensing and matrix completion affect the affinity matrix and in succession the spectral coordinates. These perturbation results for multi-class clustering require an eigengap between the kth and (k+1)th eigenvalues of the affinity matrix, which naturally occurs in data with k well-defined clusters. Our theoretical guarantees are complemented with numerical results along with a number of examples of the unsupervised organization and clustering of image data

    A Foreground Masking Strategy for [CII] Intensity Mapping Experiments Using Galaxies Selected by Stellar Mass and Redshift

    Get PDF
    Intensity mapping provides a unique means to probe the epoch of reionization (EoR), when the neutral intergalactic medium was ionized by the energetic photons emitted from the first galaxies. The [CII] 158μ\mum fine-structure line is typically one of the brightest emission lines of star-forming galaxies and thus a promising tracer of the global EoR star-formation activity. However, [CII] intensity maps at 6≲z≲86 \lesssim z \lesssim 8 are contaminated by interloping CO rotational line emission (3≤Jupp≤63 \leq J_{\rm upp} \leq 6) from lower-redshift galaxies. Here we present a strategy to remove the foreground contamination in upcoming [CII] intensity mapping experiments, guided by a model of CO emission from foreground galaxies. The model is based on empirical measurements of the mean and scatter of the total infrared luminosities of galaxies at z108 M⊙z 10^{8}\,\rm M_{\rm \odot} selected in KK-band from the COSMOS/UltraVISTA survey, which can be converted to CO line strengths. For a mock field of the Tomographic Ionized-carbon Mapping Experiment (TIME), we find that masking out the "voxels" (spectral-spatial elements) containing foreground galaxies identified using an optimized CO flux threshold results in a zz-dependent criterion mKAB≲22m^{\rm AB}_{\rm K} \lesssim 22 (or M∗≳109 M⊙M_{*} \gtrsim 10^{9} \,\rm M_{\rm \odot}) at z<1z < 1 and makes a [CII]/COtot_{\rm tot} power ratio of ≳10\gtrsim 10 at k=0.1k=0.1 hh/Mpc achievable, at the cost of a moderate ≲8%\lesssim 8\% loss of total survey volume.Comment: 14 figures, 4 tables, re-submitted to ApJ after addressing reviewer's comments. Comments welcom

    Defining and Evaluating Network Communities based on Ground-truth

    Full text link
    Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30% relative improvement over current local clustering methods.Comment: Proceedings of 2012 IEEE International Conference on Data Mining (ICDM), 201
    • …
    corecore