714 research outputs found

    Variational Bayesian Inference and Complexity Control for Stochastic Block Models

    Full text link
    It is now widely accepted that knowledge can be acquired from networks by clustering their vertices according to connection profiles. Many methods have been proposed and in this paper we concentrate on the Stochastic Block Model (SBM). The clustering of vertices and the estimation of SBM model parameters have been subject to previous work and numerous inference strategies such as variational Expectation Maximization (EM) and classification EM have been proposed. However, SBM still suffers from a lack of criteria to estimate the number of components in the mixture. To our knowledge, only one model based criterion, ICL, has been derived for SBM in the literature. It relies on an asymptotic approximation of the Integrated Complete-data Likelihood and recent studies have shown that it tends to be too conservative in the case of small networks. To tackle this issue, we propose a new criterion that we call ILvb, based on a non asymptotic approximation of the marginal likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm

    The blessing of transitivity in sparse and stochastic networks

    Full text link
    The interaction between transitivity and sparsity, two common features in empirical networks, implies that there are local regions of large sparse networks that are dense. We call this the blessing of transitivity and it has consequences for both modeling and inference. Extant research suggests that statistical inference for the Stochastic Blockmodel is more difficult when the edges are sparse. However, this conclusion is confounded by the fact that the asymptotic limit in all of the previous studies is not merely sparse, but also non-transitive. To retain transitivity, the blocks cannot grow faster than the expected degree. Thus, in sparse models, the blocks must remain asymptotically small. \n Previous algorithmic research demonstrates that small "local" clusters are more amenable to computation, visualization, and interpretation when compared to "global" graph partitions. This paper provides the first statistical results that demonstrate how these small transitive clusters are also more amenable to statistical estimation. Theorem 2 shows that a "local" clustering algorithm can, with high probability, detect a transitive stochastic block of a fixed size (e.g. 30 nodes) embedded in a large graph. The only constraint on the ambient graph is that it is large and sparse--it could be generated at random or by an adversary--suggesting a theoretical explanation for the robust empirical performance of local clustering algorithms

    Structural and Functional Discovery in Dynamic Networks with Non-negative Matrix Factorization

    Full text link
    Time series of graphs are increasingly prevalent in modern data and pose unique challenges to visual exploration and pattern extraction. This paper describes the development and application of matrix factorizations for exploration and time-varying community detection in time-evolving graph sequences. The matrix factorization model allows the user to home in on and display interesting, underlying structure and its evolution over time. The methods are scalable to weighted networks with a large number of time points or nodes, and can accommodate sudden changes to graph topology. Our techniques are demonstrated with several dynamic graph series from both synthetic and real world data, including citation and trade networks. These examples illustrate how users can steer the techniques and combine them with existing methods to discover and display meaningful patterns in sizable graphs over many time points.Comment: 16 pages, 17 figure

    Low-distortion Inference of Latent Similarities from a Multiplex Social Network

    Full text link
    Much of social network analysis is - implicitly or explicitly - predicated on the assumption that individuals tend to be more similar to their friends than to strangers. Thus, an observed social network provides a noisy signal about the latent underlying "social space:" the way in which individuals are similar or dissimilar. Many research questions frequently addressed via social network analysis are in reality questions about this social space, raising the question of inverting the process: Given a social network, how accurately can we reconstruct the social structure of similarities and dissimilarities? We begin to address this problem formally. Observed social networks are usually multiplex, in the sense that they reflect (dis)similarities in several different "categories," such as geographical proximity, kinship, or similarity of professions/hobbies. We assume that each such category is characterized by a latent metric capturing (dis)similarities in this category. Each category gives rise to a separate social network: a random graph parameterized by this metric. For a concrete model, we consider Kleinberg's small world model and some variations thereof. The observed social network is the unlabeled union of these graphs, i.e., the presence or absence of edges can be observed, but not their origins. Our main result is an algorithm which reconstructs each metric with provably low distortion.Comment: 51 pages. Compared to the previous version: many small changes to improve presentation and clarit

    Overlapping Community Detection via Local Spectral Clustering

    Full text link
    Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is crucial to shift our attention from macroscopic structure to microscopic structure in large networks. A growing body of work has been adopting local expansion methods in order to identify the community members from a few exemplary seed members. In this paper, we propose a novel approach for finding overlapping communities called LEMON (Local Expansion via Minimum One Norm). The algorithm finds the community by seeking a sparse vector in the span of the local spectra such that the seeds are in its support. We show that LEMON can achieve the highest detection accuracy among state-of-the-art proposals. The running time depends on the size of the community rather than that of the entire graph. The algorithm is easy to implement, and is highly parallelizable. We further provide theoretical analysis on the local spectral properties, bounding the measure of tightness of extracted community in terms of the eigenvalues of graph Laplacian. Moreover, given that networks are not all similar in nature, a comprehensive analysis on how the local expansion approach is suited for uncovering communities in different networks is still lacking. We thoroughly evaluate our approach using both synthetic and real-world datasets across different domains, and analyze the empirical variations when applying our method to inherently different networks in practice. In addition, the heuristics on how the seed set quality and quantity would affect the performance are provided.Comment: Extended version to the conference proceeding in WWW'1

    Partitioning Networks with Node Attributes by Compressing Information Flow

    Full text link
    Real-world networks are often organized as modules or communities of similar nodes that serve as functional units. These networks are also rich in content, with nodes having distinguishing features or attributes. In order to discover a network's modular structure, it is necessary to take into account not only its links but also node attributes. We describe an information-theoretic method that identifies modules by compressing descriptions of information flow on a network. Our formulation introduces node content into the description of information flow, which we then minimize to discover groups of nodes with similar attributes that also tend to trap the flow of information. The method has several advantages: it is conceptually simple and does not require ad-hoc parameters to specify the number of modules or to control the relative contribution of links and node attributes to network structure. We apply the proposed method to partition real-world networks with known community structure. We demonstrate that adding node attributes helps recover the underlying community structure in content-rich networks more effectively than using links alone. In addition, we show that our method is faster and more accurate than alternative state-of-the-art algorithms.Comment: 10 page

    Network Analysis of Particles and Grains

    Full text link
    The arrangements of particles and forces in granular materials have a complex organization on multiple spatial scales that ranges from local structures to mesoscale and system-wide ones. This multiscale organization can affect how a material responds or reconfigures when exposed to external perturbations or loading. The theoretical study of particle-level, force-chain, domain, and bulk properties requires the development and application of appropriate physical, mathematical, statistical, and computational frameworks. Traditionally, granular materials have been investigated using particulate or continuum models, each of which tends to be implicitly agnostic to multiscale organization. Recently, tools from network science have emerged as powerful approaches for probing and characterizing heterogeneous architectures across different scales in complex systems, and a diverse set of methods have yielded fascinating insights into granular materials. In this paper, we review work on network-based approaches to studying granular matter and explore the potential of such frameworks to provide a useful description of these systems and to enhance understanding of their underlying physics. We also outline a few open questions and highlight particularly promising future directions in the analysis and design of granular matter and other kinds of material networks

    Ensemble-Based Discovery of Disjoint, Overlapping and Fuzzy Community Structures in Networks

    Full text link
    Though much work has been done on ensemble clustering in data mining, the application of ensemble methods to community detection in networks is in its infancy. In this paper, we propose two ensemble methods: ENDISCO and MEDOC. ENDISCO performs disjoint community detection. In contrast, MEDOC performs disjoint, overlapping, and fuzzy community detection and represents the first ever ensemble method for fuzzy and overlapping community detection. We run extensive experiments with both algorithms against both synthetic and several real-world datasets for which community structures are known. We show that ENDISCO and MEDOC both beat the best-known existing standalone community detection algorithms (though we emphasize that they leverage them). In the case of disjoint community detection, we show that both ENDISCO and MEDOC beat an existing ensemble community detection algorithm both in terms of multiple accuracy measures and run-time. We further show that our ensemble algorithms can help explore core-periphery structure of network communities, identify stable communities in dynamic networks and help solve the "degeneracy of solutions" problem, generating robust results

    A Survey of Community Search Over Big Graphs

    Full text link
    With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization, friend recommendation, and so on. Consequently, how to efficiently find high-quality communities from big graphs is an important research topic in the era of big data. Recently a large group of research works, called community search, have been proposed. They aim to provide efficient solutions for searching high-quality communities from large networks in real-time. Nevertheless, these works focus on different types of graphs and formulate communities in different manners, and thus it is desirable to have a comprehensive review of these works. In this survey, we conduct a thorough review of existing community search works. Moreover, we analyze and compare the quality of communities under their models, and the performance of different solutions. Furthermore, we point out new research directions. This survey does not only help researchers to have a better understanding of existing community search solutions, but also provides practitioners a better judgment on choosing the proper solutions

    GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models

    Full text link
    Modern data analysis pipelines are becoming increasingly complex due to the presence of multi-view information sources. While graphs are effective in modeling complex relationships, in many scenarios a single graph is rarely sufficient to succinctly represent all interactions, and hence multi-layered graphs have become popular. Though this leads to richer representations, extending solutions from the single-graph case is not straightforward. Consequently, there is a strong need for novel solutions to solve classical problems, such as node classification, in the multi-layered case. In this paper, we consider the problem of semi-supervised learning with multi-layered graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for community discovery, we argue that feature learning with random node attributes, using graph neural networks, can be more effective. To this end, we propose to use attention models for effective feature learning, and develop two novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer dependencies for building multi-layered graph embeddings. Using empirical studies on several benchmark datasets, we evaluate the proposed approaches and demonstrate significant performance improvements in comparison to state-of-the-art network embedding strategies. The results also show that using simple random features is an effective choice, even in cases where explicit node attributes are not available
    • …
    corecore