15 research outputs found

    Semi-supervised stochastic blockmodel for structure analysis of signed networks

    Full text link
    © 2020 Elsevier B.V. Finding hidden structural patterns is a critical problem for all types of networks, including signed networks. Among all of the methods for structural analysis of complex network, stochastic blockmodel (SBM) is an important research tool because it is flexible and can generate networks with many different types of structures. However, most existing SBM learning methods for signed networks are unsupervised, leading to poor performance in terms of finding hidden structural patterns, especially when handling noisy and sparse networks. Learning SBM in a semi-supervised way is a promising avenue for overcoming the above difficulty. In this type of model, a small number of labelled nodes and a large number of unlabelled nodes, coupled with their network structures, are simultaneously used to train SBM. We propose a novel semi-supervised signed stochastic blockmodel and its learning algorithm based on variational Bayesian inference, with the goal of discovering both assortative (the nodes connect more densely in same clusters than that in different clusters) and disassortative (the nodes link more sparsely in same clusters than that in different clusters) structures from signed networks. The proposed model is validated through a number of experiments wherein it compared with the state-of-the-art methods using both synthetic and real-world data. The carefully designed tests, allowing to account for different scenarios, show our method outperforms other approaches existing in this space. It is especially relevant in the case of noisy and sparse networks as they constitute the majority of the real-world networks

    SSBM: A Signed Stochastic Block Model for Multiple Structure Discovery in Large-Scale Exploratory Signed Networks

    Full text link
    Signed network structure discovery has received extensive attention and has become a research focus in the field of network science. However, most of the existing studies are focused on the networks with a single structure, e.g., community or bipartite, while ignoring multiple structures, e.g., the coexistence of community and bipartite structures. Furthermore, existing studies were faced with challenge regarding large-scale signed networks due to their high time complexity, especially when determining the number of clusters in the observed network without any prior knowledge. In view of this, we propose a mathematically principled method for signed network multiple structure discovery named the Signed Stochastic Block Model (SSBM). The SSBM can capture the multiple structures contained in signed networks, e.g., community, bipartite, and coexistence of them, by adopting a probabilistic model. Moreover, by integrating the minimum message length (MML) criterion and component-wise EM (CEM) algorithm, a scalable learning algorithm that has the ability of model selection is proposed to handle large-scale signed networks. By comparing state-of-the-art methods on synthetic and real-world signed networks, extensive experimental results demonstrate the effectiveness and efficiency of SSBM in discovering large-scale exploratory signed networks with multiple structures

    A Stochastic Block Model Approach for the Analysis of Multilevel Networks: an Application to the Sociology of Organizations

    Full text link
    A multilevel network is defined as the junction of two interaction networks, one level representing the interactions between individuals and the other the interactions between organizations. The levels are linked by an affiliation relationship, each individual belonging to a unique organization. A new Stochastic Block Model is proposed as a unified probalistic framework tailored for multilevel networks. This model contains latent blocks accounting for heterogeneity in the patterns of connection within each level and introducing dependencies between the levels. The sought connection patterns are not specified a priori which makes this approach flexible. Variational methods are used for the model inference and an Integrated Classified Likelihood criterion is developed for choosing the number of blocks and also for deciding whether the two levels are dependent or not. A comprehensive simulation study exhibits the benefit of considering this approach, illustrates the robustness of the clustering and highlights the reliability of the criterion used for model selection. This approach is applied on a sociological dataset collected during a television program trade fair, the inter-organizational level being the economic network between companies and the inter-individual level being the informal network between their representatives. It brings a synthetic representation of the two networks unraveling their intertwined structure and confirms the coopetition at stake

    Empirical Bayes estimation for random dot product graph representation of the stochastic blockmodel

    Get PDF
    Network models are increasingly used to model datasets that involve interacting units, particularly random graph models where the vertices represent individual entities and the edges represent the presence or absence of a specified interaction between these entities. Finding inherent communities in networks (i.e. partitioning vertices with a more similar interaction pattern into groups) is considered to be a fundamental task in network analysis, which aids in understanding the structural properties of real-world networks. Despite a large amount of research on this task since the emergence of graphical representation of relational data, this still remains a challenge. In particular, within the statistical community, the use of the stochastic blockmodel for this task is currently of immense interest. Recent theoretical developments have shown that adjacency spectral embedding of graphs yields tractable distributional results. Specifically, a random dot product graph formulation of the stochastic blockmodel provides a mixture of multivariate Gaussians for the asymptotic distribution of the latent positions estimated by adjacency spectral embedding. The first part of this thesis seeks to employ this new theory to provide an empirical Bayes model for estimating block memberships of vertices in a stochastic blockmodel graph. Posterior inference is conducted using a Metropolis-within-Gibbs algorithm. Performance of the model is illustrated through Monte Carlo simulation studies and experimental results on a Wikipedia dataset. Results show performance gains over other alternative models that are considered. Instead of a complete classification of vertices via community detection, one may wish to discover whether vertices possess an attribute of interest. Given that this attribute is observed for a few vertices, the goal is to find other vertices that possess that same attribute. As an example, if a few employees in a company are known to have committed fraud, how can we identify others who may be complicit? This is a special case of community detection, known as vertex nomination, which has recently grown rapidly as a research topic. The second part of this thesis extends the empirical Bayes model for vertex nomination based on information contained in the graph structure. This yields promising simulation results as well as real-data results from an Enron email dataset. Recent studies have shown that information pertinent to vertex nomination exists not only in the graph structure but also in the edge attributes (Coppersmith and Priebe, 2012; Suwan et al., 2015). This motivates the third part of this thesis by further extending the model to exploit both graph structure and edge attributes for vertex nomination. Simulation studies confirm the benefit of doing so. However, the same benefit is not observed when the model is applied to the Enron email dataset; further investigations suggest that this is due to the data violating one of the model assumptions

    AN EDGE-CENTRIC PERSPECTIVE FOR BRAIN NETWORK COMMUNITIES

    Get PDF
    Thesis (Ph.D.) - Indiana University, Department of Psychological and Brain Sciences and Program in Neuroscience, 2021The brain is a complex system organized on multiple scales and operating in both a local and distributed manner. Individual neurons and brain regions participate in specific functions, while at the same time existing in the context of a larger network, supporting a range of different functionalities. Building brain networks comprised of distinct neural elements (nodes) and their interrelationships (edges), allows us to model the brain from both local and global perspectives, and to deploy a wide array of computational network tools. A popular network analysis approach is community detection, which aims to subdivide a network’s nodes into clusters that can used to represent and evaluate network organization. Prevailing community detection approaches applied to brain networks are designed to find densely interconnected sets of nodes, leading to the notion that the brain is organized in an exclusively modular manner. Furthermore, many brain network analyses tend to focus on the nodes, evidenced by the search for modular groupings of neural elements that might serve a common function. In this thesis, we describe the application of community detection algorithms that are sensitive to alternative cluster configurations, enhancing our understanding of brain network organization. We apply a framework called the stochastic block model, which we use to uncover evidence of non-modular organization in human anatomical brain networks across the life span, and in the informatically-collated rat cerebral cortex. We also propose a framework to cluster functional brain network edges in human data, which naturally results in an overlapping organization at the level of nodes that bridges canonical functional systems. These alternative methods utilize the connection patterns of brain network edges in ways that prevailing approaches do not. Thus, we motivate an alternative outlook which focuses on the importance of information provided by the brain’s interconnections, or edges. We call this an edge-centric perspective. The edge-centric approaches developed here offer new ways to characterize distributed brain organization and contribute to a fundamental change in perspective in our thinking about the brain

    A Bayesian Analysis of Weighted Stochastic Block Models With Applications in Brain Functional Connectomics

    Get PDF
    The network paradigm has become a popular approach for modeling complex systems, with applications ranging from social sciences to genetics to neuroscience and beyond. Often the individual connections between network nodes are of less interest than network char- acteristics such as its community structure - the tendency in many real-data networks for nodes to be naturally organized in groups with dense connections between nodes in the same (unobserved) group but sparse connections between nodes in different groups. Char- acterizing the structure of networks is of particular interest in the study of brain function, especially in the context of diseases and disorders such as Alzheimer’s disease and attention deficit hyperactivity disorder (ADHD), where disruption of functional brain networks has been observed. The stochastic block model (SBM) is a probabilistic formulation of the community de- tection problem that has been utilized to estimate latent communities in both binary and weighted networks, but as of yet not in brain networks. We build a flexible Bayesian hierar- chical framework for the SBM to capture the community structure in weighted graphs, with a focus on the application in functional brain networks. First, we fit a version of the SBM to Gaussian-weighted networks via an efficient Gibbs sampling algorithm. We compare results from simulated networks to several existing esti- mation methods and then apply our approach to estimate the community structures in the functional resting brain networks of 185 subjects from the ADHD-200 sample. Next, we extend this probabilistic framework and our efficient estimation algorithm to capture the shared latent structure in groups of networks; we perform simulation studies and then apply this extended model to the same sample of brain networks from the ADHD-200 sample. Finally, we adapt this model to allow for more complex latent structures and incorporate a regression component to test for differences in the latent functional brain structure between study groups. After examining the ability of this approach to capture the latent structures in simulated networks, we apply this method once again to the same set of functional brain networks to assess the differences between ADHD subtypes and healthy control subjects in latent functional brain structure.Doctor of Philosoph

    Community detection in graphs

    Full text link
    The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Report

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces
    corecore