68,493 research outputs found
Finding missing edges in networks based on their community structure
Many edge prediction methods have been proposed, based on various local or
global properties of the structure of an incomplete network. Community
structure is another significant feature of networks: Vertices in a community
are more densely connected than average. It is often true that vertices in the
same community have "similar" properties, which suggests that missing edges are
more likely to be found within communities than elsewhere. We use this insight
to propose a strategy for edge prediction that combines existing edge
prediction methods with community detection. We show that this method gives
better prediction accuracy than existing edge prediction methods alone.Comment: 7 pages, 6 figure
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Many real networks that are inferred or collected from data are incomplete
due to missing edges. Missing edges can be inherent to the dataset (Facebook
friend links will never be complete) or the result of sampling (one may only
have access to a portion of the data). The consequence is that downstream
analyses that consume the network will often yield less accurate results than
if the edges were complete. Community detection algorithms, in particular,
often suffer when critical intra-community edges are missing. We propose a
novel consensus clustering algorithm to enhance community detection on
incomplete networks. Our framework utilizes existing community detection
algorithms that process networks imputed by our link prediction based
algorithm. The framework then merges their multiple outputs into a final
consensus output. On average our method boosts performance of existing
algorithms by 7% on artificial data and 17% on ego networks collected from
Facebook
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
The impact of partially missing communities~on the reliability of centrality measures
Network data is usually not error-free, and the absence of some nodes is a
very common type of measurement error. Studies have shown that the reliability
of centrality measures is severely affected by missing nodes. This paper
investigates the reliability of centrality measures when missing nodes are
likely to belong to the same community. We study the behavior of five commonly
used centrality measures in uniform and scale-free networks in various error
scenarios. We find that centrality measures are generally more reliable when
missing nodes are likely to belong to the same community than in cases in which
nodes are missing uniformly at random. In scale-free networks, the betweenness
centrality becomes, however, less reliable when missing nodes are more likely
to belong to the same community. Moreover, centrality measures in scale-free
networks are more reliable in networks with stronger community structure. In
contrast, we do not observe this effect for uniform networks. Our observations
suggest that the impact of missing nodes on the reliability of centrality
measures might not be as severe as the literature suggests
Community Detection in Networks with Node Attributes
Community detection algorithms are fundamental tools that allow us to uncover
organizational principles in networks. When detecting communities, there are
two possible sources of information one can use: the network structure, and the
features and attributes of nodes. Even though communities form around nodes
that have common edges and common attributes, typically, algorithms have only
focused on one of these two data modalities: community detection algorithms
traditionally focus only on the network structure, while clustering algorithms
mostly consider only node attributes. In this paper, we develop Communities
from Edge Structure and Node Attributes (CESNA), an accurate and scalable
algorithm for detecting overlapping communities in networks with node
attributes. CESNA statistically models the interaction between the network
structure and the node attributes, which leads to more accurate community
detection as well as improved robustness in the presence of noise in the
network structure. CESNA has a linear runtime in the network size and is able
to process networks an order of magnitude larger than comparable approaches.
Last, CESNA also helps with the interpretation of detected communities by
finding relevant node attributes for each community.Comment: Published in the proceedings of IEEE ICDM '1
Communities in Networks
We survey some of the concepts, methods, and applications of community
detection, which has become an increasingly important area of network science.
To help ease newcomers into the field, we provide a guide to available
methodology and open problems, and discuss why scientists from diverse
backgrounds are interested in these problems. As a running theme, we emphasize
the connections of community detection to problems in statistical physics and
computational optimization.Comment: survey/review article on community structure in networks; published
version is available at
http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd
A General Framework for Complex Network Applications
Complex network theory has been applied to solving practical problems from
different domains. In this paper, we present a general framework for complex
network applications. The keys of a successful application are a thorough
understanding of the real system and a correct mapping of complex network
theory to practical problems in the system. Despite of certain limitations
discussed in this paper, complex network theory provides a foundation on which
to develop powerful tools in analyzing and optimizing large interconnected
systems.Comment: 8 page
Finding and evaluating community structure in networks
We propose and study a set of algorithms for discovering community structure
in networks -- natural divisions of network nodes into densely connected
subgroups. Our algorithms all share two definitive features: first, they
involve iterative removal of edges from the network to split it into
communities, the edges removed being identified using one of a number of
possible "betweenness" measures, and second, these measures are, crucially,
recalculated after each removal. We also propose a measure for the strength of
the community structure found by our algorithms, which gives us an objective
metric for choosing the number of communities into which a network should be
divided. We demonstrate that our algorithms are highly effective at discovering
community structure in both computer-generated and real-world network data, and
show how they can be used to shed light on the sometimes dauntingly complex
structure of networked systems.Comment: 16 pages, 13 figure
- …