13,867 research outputs found
Performance Analysis of Spectral Clustering on Compressed, Incomplete and Inaccurate Measurements
Spectral clustering is one of the most widely used techniques for extracting
the underlying global structure of a data set. Compressed sensing and matrix
completion have emerged as prevailing methods for efficiently recovering sparse
and partially observed signals respectively. We combine the distance preserving
measurements of compressed sensing and matrix completion with the power of
robust spectral clustering. Our analysis provides rigorous bounds on how small
errors in the affinity matrix can affect the spectral coordinates and
clusterability. This work generalizes the current perturbation results of
two-class spectral clustering to incorporate multi-class clustering with k
eigenvectors. We thoroughly track how small perturbation from using compressed
sensing and matrix completion affect the affinity matrix and in succession the
spectral coordinates. These perturbation results for multi-class clustering
require an eigengap between the kth and (k+1)th eigenvalues of the affinity
matrix, which naturally occurs in data with k well-defined clusters. Our
theoretical guarantees are complemented with numerical results along with a
number of examples of the unsupervised organization and clustering of image
data
A Foreground Masking Strategy for [CII] Intensity Mapping Experiments Using Galaxies Selected by Stellar Mass and Redshift
Intensity mapping provides a unique means to probe the epoch of reionization
(EoR), when the neutral intergalactic medium was ionized by the energetic
photons emitted from the first galaxies. The [CII] 158m fine-structure
line is typically one of the brightest emission lines of star-forming galaxies
and thus a promising tracer of the global EoR star-formation activity. However,
[CII] intensity maps at are contaminated by
interloping CO rotational line emission () from
lower-redshift galaxies. Here we present a strategy to remove the foreground
contamination in upcoming [CII] intensity mapping experiments, guided by a
model of CO emission from foreground galaxies. The model is based on empirical
measurements of the mean and scatter of the total infrared luminosities of
galaxies at
selected in -band from the COSMOS/UltraVISTA survey, which can be converted
to CO line strengths. For a mock field of the Tomographic Ionized-carbon
Mapping Experiment (TIME), we find that masking out the "voxels"
(spectral-spatial elements) containing foreground galaxies identified using an
optimized CO flux threshold results in a -dependent criterion (or ) at and makes a [CII]/CO power ratio of at
/Mpc achievable, at the cost of a moderate loss of total
survey volume.Comment: 14 figures, 4 tables, re-submitted to ApJ after addressing reviewer's
comments. Comments welcom
Defining and Evaluating Network Communities based on Ground-truth
Nodes in real-world networks organize into densely linked communities where
edges appear with high concentration among the members of the community.
Identifying such communities of nodes has proven to be a challenging task
mainly due to a plethora of definitions of a community, intractability of
algorithms, issues with evaluation and the lack of a reliable gold-standard
ground-truth.
In this paper we study a set of 230 large real-world social, collaboration
and information networks where nodes explicitly state their group memberships.
For example, in social networks nodes explicitly join various interest based
social groups. We use such groups to define a reliable and robust notion of
ground-truth communities. We then propose a methodology which allows us to
compare and quantitatively evaluate how different structural definitions of
network communities correspond to ground-truth communities. We choose 13
commonly used structural definitions of network communities and examine their
sensitivity, robustness and performance in identifying the ground-truth. We
show that the 13 structural definitions are heavily correlated and naturally
group into four classes. We find that two of these definitions, Conductance and
Triad-participation-ratio, consistently give the best performance in
identifying ground-truth communities. We also investigate a task of detecting
communities given a single seed node. We extend the local spectral clustering
algorithm into a heuristic parameter-free community detection method that
easily scales to networks with more than hundred million nodes. The proposed
method achieves 30% relative improvement over current local clustering methods.Comment: Proceedings of 2012 IEEE International Conference on Data Mining
(ICDM), 201
- …