21,831 research outputs found

    Towards realistic artificial benchmark for community detection algorithms evaluation

    Full text link
    Assessing the partitioning performance of community detection algorithms is one of the most important issues in complex network analysis. Artificially generated networks are often used as benchmarks for this purpose. However, previous studies showed their level of realism have a significant effect on the algorithms performance. In this study, we adopt a thorough experimental approach to tackle this problem and investigate this effect. To assess the level of realism, we use consensual network topological properties. Based on the LFR method, the most realistic generative method to date, we propose two alternative random models to replace the Configuration Model originally used in this algorithm, in order to increase its realism. Experimental results show both modifications allow generating collections of community-structured artificial networks whose topological properties are closer to those encountered in real-world networks. Moreover, the results obtained with eleven popular community identification algorithms on these benchmarks show their performance decrease on more realistic networks

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Spectral Graph Forge: Graph Generation Targeting Modularity

    Full text link
    Community structure is an important property that captures inhomogeneities common in large networks, and modularity is one of the most widely used metrics for such community structure. In this paper, we introduce a principled methodology, the Spectral Graph Forge, for generating random graphs that preserves community structure from a real network of interest, in terms of modularity. Our approach leverages the fact that the spectral structure of matrix representations of a graph encodes global information about community structure. The Spectral Graph Forge uses a low-rank approximation of the modularity matrix to generate synthetic graphs that match a target modularity within user-selectable degree of accuracy, while allowing other aspects of structure to vary. We show that the Spectral Graph Forge outperforms state-of-the-art techniques in terms of accuracy in targeting the modularity and randomness of the realizations, while also preserving other local structural properties and node attributes. We discuss extensions of the Spectral Graph Forge to target other properties beyond modularity, and its applications to anonymization

    Scalable Approach to Uncertainty Quantification and Robust Design of Interconnected Dynamical Systems

    Full text link
    Development of robust dynamical systems and networks such as autonomous aircraft systems capable of accomplishing complex missions faces challenges due to the dynamically evolving uncertainties coming from model uncertainties, necessity to operate in a hostile cluttered urban environment, and the distributed and dynamic nature of the communication and computation resources. Model-based robust design is difficult because of the complexity of the hybrid dynamic models including continuous vehicle dynamics, the discrete models of computations and communications, and the size of the problem. We will overview recent advances in methodology and tools to model, analyze, and design robust autonomous aerospace systems operating in uncertain environment, with stress on efficient uncertainty quantification and robust design using the case studies of the mission including model-based target tracking and search, and trajectory planning in uncertain urban environment. To show that the methodology is generally applicable to uncertain dynamical systems, we will also show examples of application of the new methods to efficient uncertainty quantification of energy usage in buildings, and stability assessment of interconnected power networks

    Changepoint Detection over Graphs with the Spectral Scan Statistic

    Full text link
    We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two connected induced subgraphs of relatively low cut size. We analyze the corresponding generalized likelihood ratio (GLR) statistics and relate it to the problem of finding a sparsest cut in a graph. We develop a tractable relaxation of the GLR statistic based on the combinatorial Laplacian of the graph, which we call the spectral scan statistic, and analyze its properties. We show how its performance as a testing procedure depends directly on the spectrum of the graph, and use this result to explicitly derive its asymptotic properties on few significant graph topologies. Finally, we demonstrate both theoretically and by simulations that the spectral scan statistic can outperform naive testing procedures based on edge thresholding and χ2\chi^2 testing

    Practical Attacks Against Graph-based Clustering

    Full text link
    Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.Comment: ACM CCS 201
    corecore