146 research outputs found

    Exact ICL maximization in a non-stationary time extension of the latent block model for dynamic networks

    Get PDF
    The latent block model (LBM) is a flexible probabilistic tool to describe interactions between node sets in bipartite networks, but it does not account for interactions of time varying intensity between nodes in unknown classes. In this paper we propose a non stationary temporal extension of the LBM that clusters simultaneously the two node sets of a bipartite network and constructs classes of time intervals on which interactions are stationary. The number of clusters as well as the membership to classes are obtained by maximizing the exact complete-data integrated likelihood relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology.Comment: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Apr 2015, Bruges, Belgium. pp.225-230, 2015, Proceedings of the 23-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015

    Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks

    Full text link
    The stochastic block model (SBM) is a flexible probabilistic tool that can be used to model interactions between clusters of nodes in a network. However, it does not account for interactions of time varying intensity between clusters. The extension of the SBM developed in this paper addresses this shortcoming through a temporal partition: assuming interactions between nodes are recorded on fixed-length time intervals, the inference procedure associated with the model we propose allows to cluster simultaneously the nodes of the network and the time intervals. The number of clusters of nodes and of time intervals, as well as the memberships to clusters, are obtained by maximizing an exact integrated complete-data likelihood, relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology

    Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL

    Full text link
    We develop a model in which interactions between nodes of a dynamic network are counted by non homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then a maximum-likelihood estimator is developed to estimate non parametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology

    Statistical clustering of temporal networks through a dynamic stochastic block model

    Get PDF
    Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of the model parameters, propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with existing ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm

    Modelling time evolving interactions in networks through a non stationary extension of stochastic block models

    Get PDF
    National audienceIn this paper, we focus on the stochastic block model (SBM),a probabilistic tool describing interactions between nodes of a network using latent clusters. The SBM assumes that the networkhas a stationary structure, in which connections of time varying intensity are not taken into account. In other words, interactions between two groups are forced to have the same features during the whole observation time. To overcome this limitation,we propose a partition of the whole time horizon, in which interactions are observed, and develop a non stationary extension of the SBM,allowing to simultaneously cluster the nodes in a network along with fixed time intervals in which the interactions take place. The number of clusters (K for nodes, D for time intervals) as well as the class memberships are finallyobtained through maximizing the complete-data integrated likelihood by means of a greedy search approach. After showing that the model works properly with simulated data, we focus on a real data set. We thus consider the three days ACM Hypertext conference held in Turin,June 29th - July 1st 2009. Proximity interactions between attendees during the first day are modelled and an interestingclustering of the daily hours is finally obtained, with times of social gathering (e.g. coffee breaks) recovered by the approach. Applications to large networks are limited due to the computational complexity of the greedy search which is dominated bythe number KmaxK_{max} and DmaxD_{max} of clusters used in the initialization. Therefore,advanced clustering tools are considered to reduce the number of clusters expected in the data, making the greedy search applicable to large networks.Le modèle à blocs stochastiques (SBM) décrit les interactions entre les sommets d'un graphe selon une approche probabiliste, basée sur des classes latentes. SBM fait l'hypothèse implicite que le graphe est stationnaire. Par conséquent, les interactions entre deux classes sont supposées avoir la même intensité pendant toute la période d'activité. Pour relaxer l'hypothèse de stationnarité, nous proposons une partition de l'horizon temporel en sous intervalles disjoints, chacun de même longueur. Ensuite, nous proposons une extension de SBM qui nous permet de classer en même temps les sommets du graphe et les intervalles de temps où les interactions ont lieu. Le nombre de classes latentes (K pour les sommets, D pour les intervalles de temps) est enfin obtenu à travers la maximisation de la vraisemblance intégrée des données complétées (ICL exacte). Après avoir testé le modèle sur des données simulées, nous traitons un cas réel. Pendant une journée, les interactions parmi les participants de la conférence HCM Hypertext (Turin, 29 Juin – 1er Juillet 2009) ont été traitées. Notre méthodologie nous a permis d'obtenir une classifications intéressante des 24 heures: les moments de rencontre tels que les pauses café ou buffets ont bien été détectés. La complexité de l'algorithme de recherche, linéaire en fonction du nombre initial de clusters (KmaxK_{max} et DmaxD_{max} respectivement), nous oriente vers l'utilisation d'instruments avancés de classification, pour réduire le nombre attendu de classes latentes et ainsi pouvoir utiliser le modèle pour des réseaux de grand dimension

    A semiparametric extension of the stochastic block model for longitudinal networks

    Full text link
    To model recurrent interaction events in continuous time, an extension of the stochastic block model is proposed where every individual belongs to a latent group and interactions between two individuals follow a conditional inhomogeneous Poisson process with intensity driven by the individuals' latent groups. The model is shown to be identifiable and its estimation is based on a semiparametric variational expectation-maximization algorithm. Two versions of the method are developed, using either a nonparametric histogram approach (with an adaptive choice of the partition size) or kernel intensity estimators. The number of latent groups can be selected by an integrated classification likelihood criterion. Finally, we demonstrate the performance of our procedure on synthetic experiments, analyse two datasets to illustrate the utility of our approach and comment on competing methods

    The Block Point Process Model for Continuous-Time Event-Based Dynamic Networks

    Full text link
    We consider the problem of analyzing timestamped relational events between a set of entities, such as messages between users of an on-line social network. Such data are often analyzed using static or discrete-time network models, which discard a significant amount of information by aggregating events over time to form network snapshots. In this paper, we introduce a block point process model (BPPM) for continuous-time event-based dynamic networks. The BPPM is inspired by the well-known stochastic block model (SBM) for static networks. We show that networks generated by the BPPM follow an SBM in the limit of a growing number of nodes. We use this property to develop principled and efficient local search and variational inference procedures initialized by regularized spectral clustering. We fit BPPMs with exponential Hawkes processes to analyze several real network data sets, including a Facebook wall post network with over 3,500 nodes and 130,000 events.Comment: To appear at The Web Conference 201

    Fast Bayesian clustering and model selection for longitudinal data mixtures

    Get PDF
    The clustering of longitudinal data from a Bayesian perspective is considered , with particular attention to the selection of the number of components. Instead of using asymptotic criteria (e.g. BIC), we propose to directly maximize an exact quantity based on conjugated prior distributions of the model parameters. The prior parameters are estimated by gradient descent, via automatic differentiation. Using simulated data, we demonstrate that, in terms of accuracy of the obtained clustering, our approach is comparable to two frequentist approaches commonly used in this setting, and it outper-forms them in selecting the actual number of clusters

    Model-based co-clustering for mixed type data

    Get PDF
    International audienceThe importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features.By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets
    • …
    corecore