146 research outputs found
Exact ICL maximization in a non-stationary time extension of the latent block model for dynamic networks
The latent block model (LBM) is a flexible probabilistic tool to describe
interactions between node sets in bipartite networks, but it does not account
for interactions of time varying intensity between nodes in unknown classes. In
this paper we propose a non stationary temporal extension of the LBM that
clusters simultaneously the two node sets of a bipartite network and constructs
classes of time intervals on which interactions are stationary. The number of
clusters as well as the membership to classes are obtained by maximizing the
exact complete-data integrated likelihood relying on a greedy search approach.
Experiments on simulated and real data are carried out in order to assess the
proposed methodology.Comment: European Symposium on Artificial Neural Networks, Computational
Intelligence and Machine Learning (ESANN), Apr 2015, Bruges, Belgium.
pp.225-230, 2015, Proceedings of the 23-th European Symposium on Artificial
Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015
Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks
The stochastic block model (SBM) is a flexible probabilistic tool that can be
used to model interactions between clusters of nodes in a network. However, it
does not account for interactions of time varying intensity between clusters.
The extension of the SBM developed in this paper addresses this shortcoming
through a temporal partition: assuming interactions between nodes are recorded
on fixed-length time intervals, the inference procedure associated with the
model we propose allows to cluster simultaneously the nodes of the network and
the time intervals. The number of clusters of nodes and of time intervals, as
well as the memberships to clusters, are obtained by maximizing an exact
integrated complete-data likelihood, relying on a greedy search approach.
Experiments on simulated and real data are carried out in order to assess the
proposed methodology
Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL
We develop a model in which interactions between nodes of a dynamic network
are counted by non homogeneous Poisson processes. In a block modelling
perspective, nodes belong to hidden clusters (whose number is unknown) and the
intensity functions of the counting processes only depend on the clusters of
nodes. In order to make inference tractable we move to discrete time by
partitioning the entire time horizon in which interactions are observed in
fixed-length time sub-intervals. First, we derive an exact integrated
classification likelihood criterion and maximize it relying on a greedy search
approach. This allows to estimate the memberships to clusters and the number of
clusters simultaneously. Then a maximum-likelihood estimator is developed to
estimate non parametrically the integrated intensities. We discuss the
over-fitting problems of the model and propose a regularized version solving
these issues. Experiments on real and simulated data are carried out in order
to assess the proposed methodology
Statistical clustering of temporal networks through a dynamic stochastic block model
Statistical node clustering in discrete time dynamic networks is an emerging
field that raises many challenges. Here, we explore statistical properties and
frequentist inference in a model that combines a stochastic block model (SBM)
for its static part with independent Markov chains for the evolution of the
nodes groups through time. We model binary data as well as weighted dynamic
random graphs (with discrete or continuous edges values). Our approach,
motivated by the importance of controlling for label switching issues across
the different time steps, focuses on detecting groups characterized by a stable
within group connectivity behavior. We study identifiability of the model
parameters, propose an inference procedure based on a variational expectation
maximization algorithm as well as a model selection criterion to select for the
number of groups. We carefully discuss our initialization strategy which plays
an important role in the method and compare our procedure with existing ones on
synthetic datasets. We also illustrate our approach on dynamic contact
networks, one of encounters among high school students and two others on animal
interactions. An implementation of the method is available as a R package
called dynsbm
Modelling time evolving interactions in networks through a non stationary extension of stochastic block models
National audienceIn this paper, we focus on the stochastic block model (SBM),a probabilistic tool describing interactions between nodes of a network using latent clusters. The SBM assumes that the networkhas a stationary structure, in which connections of time varying intensity are not taken into account. In other words, interactions between two groups are forced to have the same features during the whole observation time. To overcome this limitation,we propose a partition of the whole time horizon, in which interactions are observed, and develop a non stationary extension of the SBM,allowing to simultaneously cluster the nodes in a network along with fixed time intervals in which the interactions take place. The number of clusters (K for nodes, D for time intervals) as well as the class memberships are finallyobtained through maximizing the complete-data integrated likelihood by means of a greedy search approach. After showing that the model works properly with simulated data, we focus on a real data set. We thus consider the three days ACM Hypertext conference held in Turin,June 29th - July 1st 2009. Proximity interactions between attendees during the first day are modelled and an interestingclustering of the daily hours is finally obtained, with times of social gathering (e.g. coffee breaks) recovered by the approach. Applications to large networks are limited due to the computational complexity of the greedy search which is dominated bythe number and of clusters used in the initialization. Therefore,advanced clustering tools are considered to reduce the number of clusters expected in the data, making the greedy search applicable to large networks.Le modèle à blocs stochastiques (SBM) décrit les interactions entre les sommets d'un graphe selon une approche probabiliste, basée sur des classes latentes. SBM fait l'hypothèse implicite que le graphe est stationnaire. Par conséquent, les interactions entre deux classes sont supposées avoir la même intensité pendant toute la période d'activité. Pour relaxer l'hypothèse de stationnarité, nous proposons une partition de l'horizon temporel en sous intervalles disjoints, chacun de même longueur. Ensuite, nous proposons une extension de SBM qui nous permet de classer en même temps les sommets du graphe et les intervalles de temps où les interactions ont lieu. Le nombre de classes latentes (K pour les sommets, D pour les intervalles de temps) est enfin obtenu à travers la maximisation de la vraisemblance intégrée des données complétées (ICL exacte). Après avoir testé le modèle sur des données simulées, nous traitons un cas réel. Pendant une journée, les interactions parmi les participants de la conférence HCM Hypertext (Turin, 29 Juin – 1er Juillet 2009) ont été traitées. Notre méthodologie nous a permis d'obtenir une classifications intéressante des 24 heures: les moments de rencontre tels que les pauses café ou buffets ont bien été détectés. La complexité de l'algorithme de recherche, linéaire en fonction du nombre initial de clusters ( et respectivement), nous oriente vers l'utilisation d'instruments avancés de classification, pour réduire le nombre attendu de classes latentes et ainsi pouvoir utiliser le modèle pour des réseaux de grand dimension
A semiparametric extension of the stochastic block model for longitudinal networks
To model recurrent interaction events in continuous time, an extension of the
stochastic block model is proposed where every individual belongs to a latent
group and interactions between two individuals follow a conditional
inhomogeneous Poisson process with intensity driven by the individuals' latent
groups. The model is shown to be identifiable and its estimation is based on a
semiparametric variational expectation-maximization algorithm. Two versions of
the method are developed, using either a nonparametric histogram approach (with
an adaptive choice of the partition size) or kernel intensity estimators. The
number of latent groups can be selected by an integrated classification
likelihood criterion. Finally, we demonstrate the performance of our procedure
on synthetic experiments, analyse two datasets to illustrate the utility of our
approach and comment on competing methods
The Block Point Process Model for Continuous-Time Event-Based Dynamic Networks
We consider the problem of analyzing timestamped relational events between a
set of entities, such as messages between users of an on-line social network.
Such data are often analyzed using static or discrete-time network models,
which discard a significant amount of information by aggregating events over
time to form network snapshots. In this paper, we introduce a block point
process model (BPPM) for continuous-time event-based dynamic networks. The BPPM
is inspired by the well-known stochastic block model (SBM) for static networks.
We show that networks generated by the BPPM follow an SBM in the limit of a
growing number of nodes. We use this property to develop principled and
efficient local search and variational inference procedures initialized by
regularized spectral clustering. We fit BPPMs with exponential Hawkes processes
to analyze several real network data sets, including a Facebook wall post
network with over 3,500 nodes and 130,000 events.Comment: To appear at The Web Conference 201
Fast Bayesian clustering and model selection for longitudinal data mixtures
The clustering of longitudinal data from a Bayesian perspective is considered , with particular attention to the selection of the number of components. Instead of using asymptotic criteria (e.g. BIC), we propose to directly maximize an exact quantity based on conjugated prior distributions of the model parameters. The prior parameters are estimated by gradient descent, via automatic differentiation. Using simulated data, we demonstrate that, in terms of accuracy of the obtained clustering, our approach is comparable to two frequentist approaches commonly used in this setting, and it outper-forms them in selecting the actual number of clusters
Model-based co-clustering for mixed type data
International audienceThe importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features.By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets
- …