21,831 research outputs found
Towards realistic artificial benchmark for community detection algorithms evaluation
Assessing the partitioning performance of community detection algorithms is
one of the most important issues in complex network analysis. Artificially
generated networks are often used as benchmarks for this purpose. However,
previous studies showed their level of realism have a significant effect on the
algorithms performance. In this study, we adopt a thorough experimental
approach to tackle this problem and investigate this effect. To assess the
level of realism, we use consensual network topological properties. Based on
the LFR method, the most realistic generative method to date, we propose two
alternative random models to replace the Configuration Model originally used in
this algorithm, in order to increase its realism. Experimental results show
both modifications allow generating collections of community-structured
artificial networks whose topological properties are closer to those
encountered in real-world networks. Moreover, the results obtained with eleven
popular community identification algorithms on these benchmarks show their
performance decrease on more realistic networks
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Spectral Graph Forge: Graph Generation Targeting Modularity
Community structure is an important property that captures inhomogeneities
common in large networks, and modularity is one of the most widely used metrics
for such community structure. In this paper, we introduce a principled
methodology, the Spectral Graph Forge, for generating random graphs that
preserves community structure from a real network of interest, in terms of
modularity. Our approach leverages the fact that the spectral structure of
matrix representations of a graph encodes global information about community
structure. The Spectral Graph Forge uses a low-rank approximation of the
modularity matrix to generate synthetic graphs that match a target modularity
within user-selectable degree of accuracy, while allowing other aspects of
structure to vary. We show that the Spectral Graph Forge outperforms
state-of-the-art techniques in terms of accuracy in targeting the modularity
and randomness of the realizations, while also preserving other local
structural properties and node attributes. We discuss extensions of the
Spectral Graph Forge to target other properties beyond modularity, and its
applications to anonymization
Scalable Approach to Uncertainty Quantification and Robust Design of Interconnected Dynamical Systems
Development of robust dynamical systems and networks such as autonomous
aircraft systems capable of accomplishing complex missions faces challenges due
to the dynamically evolving uncertainties coming from model uncertainties,
necessity to operate in a hostile cluttered urban environment, and the
distributed and dynamic nature of the communication and computation resources.
Model-based robust design is difficult because of the complexity of the hybrid
dynamic models including continuous vehicle dynamics, the discrete models of
computations and communications, and the size of the problem. We will overview
recent advances in methodology and tools to model, analyze, and design robust
autonomous aerospace systems operating in uncertain environment, with stress on
efficient uncertainty quantification and robust design using the case studies
of the mission including model-based target tracking and search, and trajectory
planning in uncertain urban environment. To show that the methodology is
generally applicable to uncertain dynamical systems, we will also show examples
of application of the new methods to efficient uncertainty quantification of
energy usage in buildings, and stability assessment of interconnected power
networks
Changepoint Detection over Graphs with the Spectral Scan Statistic
We consider the change-point detection problem of deciding, based on noisy
measurements, whether an unknown signal over a given graph is constant or is
instead piecewise constant over two connected induced subgraphs of relatively
low cut size. We analyze the corresponding generalized likelihood ratio (GLR)
statistics and relate it to the problem of finding a sparsest cut in a graph.
We develop a tractable relaxation of the GLR statistic based on the
combinatorial Laplacian of the graph, which we call the spectral scan
statistic, and analyze its properties. We show how its performance as a testing
procedure depends directly on the spectrum of the graph, and use this result to
explicitly derive its asymptotic properties on few significant graph
topologies. Finally, we demonstrate both theoretically and by simulations that
the spectral scan statistic can outperform naive testing procedures based on
edge thresholding and testing
Practical Attacks Against Graph-based Clustering
Graph modeling allows numerous security problems to be tackled in a general
way, however, little work has been done to understand their ability to
withstand adversarial attacks. We design and evaluate two novel graph attacks
against a state-of-the-art network-level, graph-based detection system. Our
work highlights areas in adversarial machine learning that have not yet been
addressed, specifically: graph-based clustering techniques, and a global
feature space where realistic attackers without perfect knowledge must be
accounted for (by the defenders) in order to be practical. Even though less
informed attackers can evade graph clustering with low cost, we show that some
practical defenses are possible.Comment: ACM CCS 201
Recommended from our members
Community detection in network analysis: a survey
The existence of community structures in networks is not unusual, including in the domains of sociology, biology, and business, etc. The characteristic of the community structure is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity.
In academia, there is a surge in research efforts on community detection in network analysis, especially in developing statistically sound methodologies for exploring, modeling, and interpreting these kind of structures and relationships.
This survey paper aims to provide a brief review of current applicable
statistical methodologies and approaches in a comparative manner along with metrics for evaluating graph clustering results and application using R. At the
end, we provide promising future research directions.Statistic
- …