29,766 research outputs found

    How Many Communities Are There?

    Full text link
    Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are available online.Comment: 26 pages, 3 figure

    From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

    Full text link
    The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.Comment: 10 pages, 8 figures, accepted at SocInfo201

    Self-Organizing Innovation Networks: When do Small Worlds Emerge?

    Get PDF
    In this paper, we present a model of 'collective innovation' built upon the network formation formalism. In our model, agents localized on a circle benefit from knowledge flows from other agents with whom they are directly or indirectly connected. They support costs for direct connections which are linearly increasing with geographic distance. The dynamic process of network formation exhibits prefeRential meeting for close agents (in the relational network and in the geographic metrics). We show how the set of stochastically stable networks selected in the long run is affected by the degree of knowledge transferability. We find critical values of this parameter for which stable \"small world\" networks are dynamically selected.Network Formation, Stochastic Stability, Preferential Meeting, Self-Organization,

    The Block Point Process Model for Continuous-Time Event-Based Dynamic Networks

    Full text link
    We consider the problem of analyzing timestamped relational events between a set of entities, such as messages between users of an on-line social network. Such data are often analyzed using static or discrete-time network models, which discard a significant amount of information by aggregating events over time to form network snapshots. In this paper, we introduce a block point process model (BPPM) for continuous-time event-based dynamic networks. The BPPM is inspired by the well-known stochastic block model (SBM) for static networks. We show that networks generated by the BPPM follow an SBM in the limit of a growing number of nodes. We use this property to develop principled and efficient local search and variational inference procedures initialized by regularized spectral clustering. We fit BPPMs with exponential Hawkes processes to analyze several real network data sets, including a Facebook wall post network with over 3,500 nodes and 130,000 events.Comment: To appear at The Web Conference 201
    • 

    corecore