156,984 research outputs found
Model-based clustering for populations of networks
Until recently obtaining data on populations of networks was typically rare.
However, with the advancement of automatic monitoring devices and the growing
social and scientific interest in networks, such data has become more widely
available. From sociological experiments involving cognitive social structures
to fMRI scans revealing large-scale brain networks of groups of patients, there
is a growing awareness that we urgently need tools to analyse populations of
networks and particularly to model the variation between networks due to
covariates. We propose a model-based clustering method based on mixtures of
generalized linear (mixed) models that can be employed to describe the joint
distribution of a populations of networks in a parsimonious manner and to
identify subpopulations of networks that share certain topological properties
of interest (degree distribution, community structure, effect of covariates on
the presence of an edge, etc.). Maximum likelihood estimation for the proposed
model can be efficiently carried out with an implementation of the EM
algorithm. We assess the performance of this method on simulated data and
conclude with an example application on advice networks in a small business.Comment: The final (published) version of the article can be downloaded for
free (Open Access) from the editor's website (click on the DOI link below
Model selection for semi-supervised clustering
Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we\ud
summarize these problems and provide a solution.\ud
Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information\ud
(labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.NSERC (Canada)FAPESP (Brazil)CNPq (Brazil
Popularity versus Similarity in Growing Networks
Popularity is attractive -- this is the formula underlying preferential
attachment, a popular explanation for the emergence of scaling in growing
networks. If new connections are made preferentially to more popular nodes,
then the resulting distribution of the number of connections that nodes have
follows power laws observed in many real networks. Preferential attachment has
been directly validated for some real networks, including the Internet.
Preferential attachment can also be a consequence of different underlying
processes based on node fitness, ranking, optimization, random walks, or
duplication. Here we show that popularity is just one dimension of
attractiveness. Another dimension is similarity. We develop a framework where
new connections, instead of preferring popular nodes, optimize certain
trade-offs between popularity and similarity. The framework admits a geometric
interpretation, in which popularity preference emerges from local optimization.
As opposed to preferential attachment, the optimization framework accurately
describes large-scale evolution of technological (Internet), social (web of
trust), and biological (E.coli metabolic) networks, predicting the probability
of new links in them with a remarkable precision. The developed framework can
thus be used for predicting new links in evolving networks, and provides a
different perspective on preferential attachment as an emergent phenomenon
Mixture Models With Grouping Structure: Retail Analytics Applications
Growing competitiveness and increasing availability of data is generating tremendous interest in data-driven analytics across industries. In the retail sector, stores need targeted guidance to improve both the efficiency and effectiveness of individual stores based on their specific location, demographics, and environment. We propose an effective data-driven framework for internal benchmarking that can lead to targeted guidance for individual stores. In particular, we propose an objective method for segmenting stores using a model-based clustering technique that accounts for similarity in store performance dynamics. It relies on effective Finite Mixture of Regression (FMR) techniques for carrying out the model-based clustering with grouping structure (`must-link\u27 constraints) and modeling store performance. We propose two alternate methods for FMR with grouping structure: 1) Competitive Learning (CL) and 2) Expectation Maximization (EM). The CL method can support both linear and non-linear regression methods whereas the more effective proposed EM approach only supports linear regression.
We also propose an optimization framework to derive tailored recommendations for individual stores within store clusters that jointly improves profitability for the store while also improving sales to satisfy franchiser requirements. We validate the methods using synthetic experiments as well as a real-world automotive dealership network study for a leading global automotive manufacturer
- …