17,148 research outputs found
Visual-hint Boundary to Segment Algorithm for Image Segmentation
Image segmentation has been a very active research topic in image analysis
area. Currently, most of the image segmentation algorithms are designed based
on the idea that images are partitioned into a set of regions preserving
homogeneous intra-regions and inhomogeneous inter-regions. However, human
visual intuition does not always follow this pattern. A new image segmentation
method named Visual-Hint Boundary to Segment (VHBS) is introduced, which is
more consistent with human perceptions. VHBS abides by two visual hint rules
based on human perceptions: (i) the global scale boundaries tend to be the real
boundaries of the objects; (ii) two adjacent regions with quite different
colors or textures tend to result in the real boundaries between them. It has
been demonstrated by experiments that, compared with traditional image
segmentation method, VHBS has better performance and also preserves higher
computational efficiency.Comment: 45 page
Multiresolution community detection for megascale networks by information-based replica correlations
We use a Potts model community detection algorithm to accurately and
quantitatively evaluate the hierarchical or multiresolution structure of a
graph. Our multiresolution algorithm calculates correlations among multiple
copies ("replicas") of the same graph over a range of resolutions. Significant
multiresolution structures are identified by strongly correlated replicas. The
average normalized mutual information, the variation of information, and other
measures in principle give a quantitative estimate of the "best" resolutions
and indicate the relative strength of the structures in the graph. Because the
method is based on information comparisons, it can in principle be used with
any community detection model that can examine multiple resolutions. Our
approach may be extended to other optimization problems. As a local measure,
our Potts model avoids the "resolution limit" that affects other popular
models. With this model, our community detection algorithm has an accuracy that
ranks among the best of currently available methods. Using it, we can examine
graphs over 40 million nodes and more than one billion edges. We further report
that the multiresolution variant of our algorithm can solve systems of at least
200000 nodes and 10 million edges on a single processor with exceptionally high
accuracy. For typical cases, we find a super-linear scaling, O(L^{1.3}) for
community detection and O(L^{1.3} log N) for the multiresolution algorithm
where L is the number of edges and N is the number of nodes in the system.Comment: 19 pages, 14 figures, published version with minor change
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Many real networks that are inferred or collected from data are incomplete
due to missing edges. Missing edges can be inherent to the dataset (Facebook
friend links will never be complete) or the result of sampling (one may only
have access to a portion of the data). The consequence is that downstream
analyses that consume the network will often yield less accurate results than
if the edges were complete. Community detection algorithms, in particular,
often suffer when critical intra-community edges are missing. We propose a
novel consensus clustering algorithm to enhance community detection on
incomplete networks. Our framework utilizes existing community detection
algorithms that process networks imputed by our link prediction based
algorithm. The framework then merges their multiple outputs into a final
consensus output. On average our method boosts performance of existing
algorithms by 7% on artificial data and 17% on ego networks collected from
Facebook
A maximal clique based multiobjective evolutionary algorithm for overlapping community detection
Detecting community structure has become one im-portant technique for studying complex networks. Although many community detection algorithms have been proposed, most of them focus on separated communities, where each node can be-long to only one community. However, in many real-world net-works, communities are often overlapped with each other. De-veloping overlapping community detection algorithms thus be-comes necessary. Along this avenue, this paper proposes a maxi-mal clique based multiobjective evolutionary algorithm for over-lapping community detection. In this algorithm, a new represen-tation scheme based on the introduced maximal-clique graph is presented. Since the maximal-clique graph is defined by using a set of maximal cliques of original graph as nodes and two maximal cliques are allowed to share the same nodes of the original graph, overlap is an intrinsic property of the maximal-clique graph. Attributing to this property, the new representation scheme al-lows multiobjective evolutionary algorithms to handle the over-lapping community detection problem in a way similar to that of the separated community detection, such that the optimization problems are simplified. As a result, the proposed algorithm could detect overlapping community structure with higher partition accuracy and lower computational cost when compared with the existing ones. The experiments on both synthetic and real-world networks validate the effectiveness and efficiency of the proposed algorithm
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Detecting hierarchical and overlapping network communities using locally optimal modularity changes
Agglomerative clustering is a well established strategy for identifying
communities in networks. Communities are successively merged into larger
communities, coarsening a network of actors into a more manageable network of
communities. The order in which merges should occur is not in general clear,
necessitating heuristics for selecting pairs of communities to merge. We
describe a hierarchical clustering algorithm based on a local optimality
property. For each edge in the network, we associate the modularity change for
merging the communities it links. For each community vertex, we call the
preferred edge that edge for which the modularity change is maximal. When an
edge is preferred by both vertices that it links, it appears to be the optimal
choice from the local viewpoint. We use the locally optimal edges to define the
algorithm: simultaneously merge all pairs of communities that are connected by
locally optimal edges that would increase the modularity, redetermining the
locally optimal edges after each step and continuing so long as the modularity
can be further increased. We apply the algorithm to model and empirical
networks, demonstrating that it can efficiently produce high-quality community
solutions. We relate the performance and implementation details to the
structure of the resulting community hierarchies. We additionally consider a
complementary local clustering algorithm, describing how to identify
overlapping communities based on the local optimality condition.Comment: 10 pages; 4 tables, 3 figure
- …