94 research outputs found
Strategies for online inference of model-based clustering in large and growing networks
In this paper we adapt online estimation strategies to perform model-based
clustering on large networks. Our work focuses on two algorithms, the first
based on the SAEM algorithm, and the second on variational methods. These two
strategies are compared with existing approaches on simulated and real data. We
use the method to decipher the connexion structure of the political websphere
during the US political campaign in 2008. We show that our online EM-based
algorithms offer a good trade-off between precision and speed, when estimating
parameters for mixture distributions in the context of random graphs.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS359 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Network and attributeâbased clustering of tennis players and tournaments
This paper aims at targeting some relevant issues for clustering tennis players and
tournaments: (i) it considers players, tournaments and the relation between them;
(ii) the relation is taken into account in the fuzzy clustering model based on the
Partitioning Around Medoids (PAM) algorithm through spatial constraints; (iii) the
attributes of the players and of the tournaments are of different nature, qualitative
and quantitative. The proposal is novel for the methodology used, a spatial Fuzzy
clustering model for players and for tournaments (based on related attributes), where
the spatial penalty term in each clustering model depends on the relation between
players and tournaments described in the adjacency matrix. The proposed model is
compared with a bipartite players-tournament complex network model (the Degree-
Corrected Stochastic Blockmodel) that considers only the relation between players
and tournaments, described in the adjacency matrix, to obtain communities on each
side of the bipartite network. An application on data taken from the ATP official
website with regards to the draws of the tournaments, and from the sport statistics
website Wheelo ratings for the performance data of players and tournaments, shows
the performances of the proposed clustering model
Community detection with node attributes in multilayer networks
Community detection in networks is commonly performed using information about interactions between nodes. Recent advances have been made to incorporate multiple types of interactions, thus generalizing standard methods to multilayer networks. Often, though, one can access additional information regarding individual nodes, attributes, or covariates. A relevant question is thus how to properly incorporate this extra information in such frameworks. Here we develop a method that incorporates both the topology of interactions and node attributes to extract communities in multilayer networks. We propose a principled probabilistic method that does not assume any a priori correlation structure between attributes and communities but rather infers this from data. This leads to an efficient algorithmic implementation that exploits the sparsity of the dataset and can be used to perform several inference tasks; we provide an open-source implementation of the code online. We demonstrate our method on both synthetic and real-world data and compare performance with methods that do not use any attribute information. We find that including node information helps in predicting missing links or attributes. It also leads to more interpretable community structures and allows the quantification of the impact of the node attributes given in input
Recommended from our members
Relational learning and fairness
This thesis will focus on relational learning in the modeling of text and user roles in networks, and the relative treatment of individuals as related to algorithmic fairness. With the exponential growth in social network data, the need for models of user interaction data is growing. This work presents a model which agglomerates users into archetypes based on topical modeling of the contents of their interactions. It further proposes models and a fairness metric for the creation of classifiers for individuals which control for the relative treatment of individualsStatistic
- âŠ